Brain decoding with MLP#

This part of the session aims to make participants familiar with Multilayer Peceptrons as one possible decoding model that can be applied to brain data. The objectives πŸ“ are:

  • get to know the basics of Multilayer Peceptrons

    • model creation

    • model training

    • model testing

Multilayer Perceptron#

_images/multilayer-perceptron.png

Fig. 6 A multilayer perceptron with 25 units on the input layer, a single hidden layer with 17 units, and an output layer with 9 units. Figure generated with the NN-SVG tool by [Alexander Lenail]. The figure is shared under a CC-BY 4.0 license.#

We are going to train a Multilayer Perceptron (MLP) classifier for brain decoding on the Haxby dataset. MLPs are one of the most basic architecture of artificial neural networks. As such, MLPs consist of input and output layers as well as hidden layers that process the input through a succession of transformations towards the output layer that performs the task at hand, e.g. a classification or regression. Like other machine learning models for supervised learning, an MLP initially goes through a training phase. During this supervised phase, the network is taught what to look for and what is the desired output via its objective function. This refers to, minimizing the loss, ie the deviation of predictions from the β€œground truth”, and thus increasing its performance.

MLPs were actually among the first ANNs to appear, specifically the Mark I Peceptron which you can see below.

https://preview.redd.it/wgzps0pvcny91.jpg?width=640&crop=smart&auto=webp&s=0b2e56dc4eaa886ebd01ac0cd8e51fc4efdb1d01

Fig. 7 Frank Rosenblatt with a Mark I Perceptron computer in 1960.#

In this tutorial, we are going to train the simplest MLP architecture featuring one input layer, one output layer and just one hidden layer.

Theoretical motivation#

The previous tutorial on brain decoding with SVM shows how to use a linear combination of brain features to train a predictor.

Let’s take a moment to consider this: a 1-layer perceptron with a sigmoid activation function models the relation between X (the input data) and y (the predicted data) the same way a logistic regression would: \(\hat{y} = \sigma(X \beta + \beta_0)\)

_images/logistic_regression.png

Fig. 8 A fitted logistic regression function classifying two different classes. Courtesy of Jérôme Dockès.#

If one optimizes the parameters of this MLP to minimize a cross-entropy loss, they’re actually optimizing for the same objective function as in a classical logistic regression problem: \(\underset{\beta, \beta_0}{\min} \sum_k y_k \log(\hat{y_k}) + (1 - y_k) \log(1 - \hat{y_k})\)

As a rule of thumb, one can consider that a 1-layer perceptron (and therefore any last layer of a multi-layer perceptron) works similarly to an SVC.

A big motivation for using multiple-layer perceptrons is that they can introduce non-linearities in our data. When training such models, the hope is that the hidden layers of the model will find meaningful non-linear combinations of the input features which help us solve our decoding problem.

Getting the data#

We are going to work with the Haxby dataset [HGF+01] again. You can check the section An overview of the Haxby dataset for more details on that dataset. Here we are going to quickly download and prepare it for machine learning applications with a set of predictive variables, the brain time series X, and a dependent variable, the respective cognitive processes/function/percepts y.

import os
import warnings
warnings.filterwarnings(action='once')

from nilearn import datasets
# We are fetching the data for subject 4
data_dir = os.path.join('..', 'data')
sub_no = 4
haxby_dataset = datasets.fetch_haxby(subjects=[sub_no], fetch_stimuli=True, data_dir=data_dir)
func_file = haxby_dataset.func[0]

# mask the data
from nilearn.input_data import NiftiMasker
mask_filename = haxby_dataset.mask_vt[0]
masker = NiftiMasker(mask_img=mask_filename, standardize=True, detrend=True)
X = masker.fit_transform(func_file)

# cognitive annotations
import pandas as pd
behavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=' ')
y = behavioral['labels']
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/nilearn/input_data/__init__.py:23: DeprecationWarning: The import path 'nilearn.input_data' is deprecated in version 0.9. Importing from 'nilearn.input_data' will be possible at least until release 0.13.0. Please import from 'nilearn.maskers' instead.
  warnings.warn(message, DeprecationWarning)
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/nilearn/image/resampling.py:492: UserWarning: The provided image has no sform in its header. Please check the provided file. Results may not be as expected.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/joblib/memory.py:312: DeprecationWarning: The default strategy for standardize is currently 'zscore' which incorrectly uses population std to calculate sample zscores. The new strategy 'zscore_sample' corrects this behavior by using the sample std. In release 0.13, the default strategy will be replaced by the new strategy and the 'zscore' option will be removed. Please use 'zscore_sample' instead.
  return self.func(*args, **kwargs)

As an initial check, we’ll have a look at the size of X and y:

categories = y.unique()
print(categories)
print(y.shape)
print(X.shape)
['rest' 'face' 'chair' 'scissors' 'shoe' 'scrambledpix' 'house' 'cat'
 'bottle']
(1452,)
(1452, 675)

So we have 1452 time points, with one label for the respective stimulus percept each, and for each time point we have recordings of brain activity obtained via fMRI across 675 voxels (within the VT mask). We can also see that the stimulus percepts span 9 different categories.

However, concerning our planned analyses, we need to convert our categories into a one-hot encoder:

# creating instance of one-hot-encoder
from sklearn.preprocessing import OneHotEncoder
import numpy as np
enc = OneHotEncoder(handle_unknown='ignore')
y_onehot = enc.fit_transform(np.array(y).reshape(-1, 1))
# turn the sparse matrix into a pandas dataframe
y = pd.DataFrame(y_onehot.toarray())
display(y[:10])
0 1 2 3 4 5 6 7 8
0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
6 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
9 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0

Training a model#

As introduced in the prior tutorials, one of the most important aspects of machine learning is the split between train and tests. MLPs are no exception to that and thus we need to split our dataset accordingly. We will keep 20% of the time points as test, and then set up a 10 fold cross validation for training/validation.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)   

With that, we can already build our MLP. Here, we are going to use Tensorflow and Keras. As with every other ANN, we need to import the respective components, here, the model and layer type. In our case we will use a Sequential model and Dense layers.

from keras.models import Sequential
from keras.layers import Dense
2024-10-25 18:10:01.134090: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-25 18:10:01.151639: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-25 18:10:01.151666: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-25 18:10:01.163845: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-25 18:10:02.137285: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

A note regarding our MLP

Please note that the example MLP we are going to create and train here is rather simple as we want to enable its application on machines with rather limited computational resources (ie your laptops or binder). β€œReal-world” models are usually more complex and might also entail different types and layers.

Initially, we need to create our, so far, empty model.

# number of unique conditions that we have
model_mlp = Sequential()

Next, we can add the layers to our model, starting with the input layer. Given this is a rather short introduction to the topic and does not focus on ANNs, we are going to set the kernel initialization and activation function to appropriate defaults (Please have a look at the Introduction to deep learning session for more information.).

model_mlp.add(Dense(50 , input_dim = 675, kernel_initializer="uniform", activation = 'relu'))
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/keras/src/layers/core/dense.py:87: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)

As noted above, we are using Dense layers and as you can see, we set the input dimensions to 675. You might have already notices that this is the number of voxels we have data from. Setting the input dimension according to the data dimensions is rather important is referred to as the semantic gap: the transformation of actions & percepts conducted/perceived by humans into computational representations. For example, pictures are β€œnothing” but a huge array for a computer and what will be submitted to the input layer of an ANN (note: this also holds true for basically any other type of data). Here, our MLP receives the extracted brain activity patterns as input which are already in the right array format thanks to nilearn. Thus, always carefully think about what your input data entails and how it is structured to then setup your input layer accordingly.

Next, we are going to add one hidden layer.

model_mlp.add(Dense(30, kernel_initializer="uniform", activation = 'relu'))

And because we are creating a very simple MLP with only three layers, we already add our output layer, using the softmax activation function given that we aim to train our MLP to predict the different categories that were perceived by the participants from their brain activity patterns.

model_mlp.add(Dense(len(categories), activation = 'softmax'))

To get a nice overview of our ANN, we can now use the .summary() function, which will provide us with the model type, model parameters and for each layer, the its type, shape and parameters.

model_mlp.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
β”‚ dense (Dense)                   β”‚ (None, 50)             β”‚        33,800 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense_1 (Dense)                 β”‚ (None, 30)             β”‚         1,530 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense_2 (Dense)                 β”‚ (None, 9)              β”‚           279 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 Total params: 35,609 (139.10 KB)
 Trainable params: 35,609 (139.10 KB)
 Non-trainable params: 0 (0.00 B)

With that, we already created our MLP architecture, which is now ready to be compiled! Within this step, we will set the optimizer, loss function and metric, ie components that define how our MLP will learn.

model_mlp.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

Now it’s to train our MLP. Thus, we have to fit it to our data, specifically only the training data. Here, we are going to provide a few more hyperparameters that will define how our MLP is going to learn. This entails the batch size, the epochs and split of validation sets. We will assign the respective output to a variable so that we can investigate our MLP’s learning process.

history = model_mlp.fit(X_train, y_train, batch_size = 10,
                             epochs = 10, validation_split = 0.2)
Epoch 1/10
 1/93 ━━━━━━━━━━━━━━━━━━━━ 1:05 709ms/step - accuracy: 0.3000 - loss: 2.2071
 2/93 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.2500 - loss: 2.2128    

65/93 ━━━━━━━━━━━━━━━━━━━━ 0s 828us/step - accuracy: 0.3592 - loss: 1.9566
66/93 ━━━━━━━━━━━━━━━━━━━━ 0s 832us/step - accuracy: 0.3603 - loss: 1.9566

93/93 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.3885 - loss: 1.8757 - val_accuracy: 0.4893 - val_loss: 1.4044
Epoch 2/10
 1/93 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.5000 - loss: 1.1240
 3/93 ━━━━━━━━━━━━━━━━━━━━ 0s 888us/step - accuracy: 0.5778 - loss: 1.0561
 2/93 ━━━━━━━━━━━━━━━━━━━━ 0s 875us/step - accuracy: 0.5500 - loss: 1.0561
 4/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5958 - loss: 1.0260 
 5/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6047 - loss: 1.0260 

67/93 ━━━━━━━━━━━━━━━━━━━━ 0s 810us/step - accuracy: 0.5946 - loss: 1.1405

70/93 ━━━━━━━━━━━━━━━━━━━━ 0s 826us/step - accuracy: 0.5957 - loss: 1.1390

69/93 ━━━━━━━━━━━━━━━━━━━━ 0s 817us/step - accuracy: 0.5953 - loss: 1.1390
68/93 ━━━━━━━━━━━━━━━━━━━━ 0s 818us/step - accuracy: 0.5949 - loss: 1.1395

93/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6044 - loss: 1.1212 - val_accuracy: 0.5708 - val_loss: 1.1577
Epoch 3/10
 1/93 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.8000 - loss: 0.7143
 2/93 ━━━━━━━━━━━━━━━━━━━━ 0s 921us/step - accuracy: 0.7750 - loss: 0.7143
 3/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.7722 - loss: 0.7554 
 4/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.7729 - loss: 0.7389

71/93 ━━━━━━━━━━━━━━━━━━━━ 0s 791us/step - accuracy: 0.7593 - loss: 0.7323

70/93 ━━━━━━━━━━━━━━━━━━━━ 0s 790us/step - accuracy: 0.7592 - loss: 0.7323

93/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.7602 - loss: 0.7317 - val_accuracy: 0.6524 - val_loss: 1.0112
Epoch 4/10
 2/93 ━━━━━━━━━━━━━━━━━━━━ 0s 835us/step - accuracy: 0.7750 - loss: 0.4939
 1/93 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.7000 - loss: 0.4939
 3/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.8056 - loss: 0.4499

67/93 ━━━━━━━━━━━━━━━━━━━━ 0s 777us/step - accuracy: 0.8609 - loss: 0.4378
68/93 ━━━━━━━━━━━━━━━━━━━━ 0s 779us/step - accuracy: 0.8607 - loss: 0.4378

69/93 ━━━━━━━━━━━━━━━━━━━━ 0s 791us/step - accuracy: 0.8605 - loss: 0.4385

93/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.8567 - loss: 0.4485 - val_accuracy: 0.7039 - val_loss: 0.8941
Epoch 5/10
 2/93 ━━━━━━━━━━━━━━━━━━━━ 0s 796us/step - accuracy: 0.9250 - loss: 0.3000
 1/93 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.9000 - loss: 0.3000
 4/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9354 - loss: 0.2980  
 3/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9389 - loss: 0.2698  

67/93 ━━━━━━━━━━━━━━━━━━━━ 0s 793us/step - accuracy: 0.9271 - loss: 0.2995

68/93 ━━━━━━━━━━━━━━━━━━━━ 0s 801us/step - accuracy: 0.9271 - loss: 0.2995

93/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9268 - loss: 0.2988 - val_accuracy: 0.6996 - val_loss: 0.8856
Epoch 6/10
 1/93 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 1.0000 - loss: 0.1164
 3/93 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9572 - loss: 0.1450
 5/93 ━━━━━━━━━━━━━━━━━━━━ 0s 885us/step - accuracy: 0.9620 - loss: 0.1450
 2/93 ━━━━━━━━━━━━━━━━━━━━ 0s 882us/step - accuracy: 0.9750 - loss: 0.1361
 4/93 ━━━━━━━━━━━━━━━━━━━━ 0s 855us/step - accuracy: 0.9625 - loss: 0.1450
 6/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9604 - loss: 0.1459  
 7/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9615 - loss: 0.1477

70/93 ━━━━━━━━━━━━━━━━━━━━ 0s 850us/step - accuracy: 0.9716 - loss: 0.1639
71/93 ━━━━━━━━━━━━━━━━━━━━ 0s 851us/step - accuracy: 0.9716 - loss: 0.1639
72/93 ━━━━━━━━━━━━━━━━━━━━ 0s 851us/step - accuracy: 0.9717 - loss: 0.1639

75/93 ━━━━━━━━━━━━━━━━━━━━ 0s 886us/step - accuracy: 0.9718 - loss: 0.1642
73/93 ━━━━━━━━━━━━━━━━━━━━ 0s 856us/step - accuracy: 0.9718 - loss: 0.1641

74/93 ━━━━━━━━━━━━━━━━━━━━ 0s 877us/step - accuracy: 0.9718 - loss: 0.1642

93/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9714 - loss: 0.1661 - val_accuracy: 0.7382 - val_loss: 0.8413
Epoch 7/10
 2/93 ━━━━━━━━━━━━━━━━━━━━ 0s 797us/step - accuracy: 1.0000 - loss: 0.0912
 1/93 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 1.0000 - loss: 0.0872
 5/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0893
 4/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0893  
 3/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0893

67/93 ━━━━━━━━━━━━━━━━━━━━ 0s 803us/step - accuracy: 0.9842 - loss: 0.1223

66/93 ━━━━━━━━━━━━━━━━━━━━ 0s 802us/step - accuracy: 0.9842 - loss: 0.1224

70/93 ━━━━━━━━━━━━━━━━━━━━ 0s 847us/step - accuracy: 0.9843 - loss: 0.1218
68/93 ━━━━━━━━━━━━━━━━━━━━ 0s 808us/step - accuracy: 0.9842 - loss: 0.1223
69/93 ━━━━━━━━━━━━━━━━━━━━ 0s 833us/step - accuracy: 0.9843 - loss: 0.1218

71/93 ━━━━━━━━━━━━━━━━━━━━ 0s 827us/step - accuracy: 0.9843 - loss: 0.1216

93/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9848 - loss: 0.1189 - val_accuracy: 0.7339 - val_loss: 0.9226
Epoch 8/10
 1/93 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 1.0000 - loss: 0.0806
 2/93 ━━━━━━━━━━━━━━━━━━━━ 0s 836us/step - accuracy: 0.9750 - loss: 0.0806
 3/93 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9722 - loss: 0.0930 
 4/93 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9729 - loss: 0.0939 

71/93 ━━━━━━━━━━━━━━━━━━━━ 0s 791us/step - accuracy: 0.9871 - loss: 0.0729

74/93 ━━━━━━━━━━━━━━━━━━━━ 0s 799us/step - accuracy: 0.9874 - loss: 0.0726
72/93 ━━━━━━━━━━━━━━━━━━━━ 0s 803us/step - accuracy: 0.9872 - loss: 0.0726

73/93 ━━━━━━━━━━━━━━━━━━━━ 0s 804us/step - accuracy: 0.9873 - loss: 0.0724

93/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9890 - loss: 0.0705 - val_accuracy: 0.7768 - val_loss: 0.8128
Epoch 9/10
 1/93 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 1.0000 - loss: 0.0294
 2/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0292
 3/93 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0325

71/93 ━━━━━━━━━━━━━━━━━━━━ 0s 803us/step - accuracy: 0.9998 - loss: 0.0350
73/93 ━━━━━━━━━━━━━━━━━━━━ 0s 792us/step - accuracy: 0.9998 - loss: 0.0350
72/93 ━━━━━━━━━━━━━━━━━━━━ 0s 781us/step - accuracy: 0.9999 - loss: 0.0350

75/93 ━━━━━━━━━━━━━━━━━━━━ 0s 794us/step - accuracy: 0.9998 - loss: 0.0350
77/93 ━━━━━━━━━━━━━━━━━━━━ 0s 805us/step - accuracy: 0.9997 - loss: 0.0350
76/93 ━━━━━━━━━━━━━━━━━━━━ 0s 808us/step - accuracy: 0.9997 - loss: 0.0350
74/93 ━━━━━━━━━━━━━━━━━━━━ 0s 809us/step - accuracy: 0.9998 - loss: 0.0350
70/93 ━━━━━━━━━━━━━━━━━━━━ 0s 781us/step - accuracy: 0.9999 - loss: 0.0349

93/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9996 - loss: 0.0349 - val_accuracy: 0.7682 - val_loss: 0.8698
Epoch 10/10
 2/93 ━━━━━━━━━━━━━━━━━━━━ 0s 883us/step - accuracy: 1.0000 - loss: 0.0191
 1/93 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 1.0000 - loss: 0.0218
 3/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0191  

68/93 ━━━━━━━━━━━━━━━━━━━━ 0s 785us/step - accuracy: 1.0000 - loss: 0.0193
73/93 ━━━━━━━━━━━━━━━━━━━━ 0s 794us/step - accuracy: 1.0000 - loss: 0.0194
70/93 ━━━━━━━━━━━━━━━━━━━━ 0s 796us/step - accuracy: 1.0000 - loss: 0.0194
72/93 ━━━━━━━━━━━━━━━━━━━━ 0s 793us/step - accuracy: 1.0000 - loss: 0.0194

71/93 ━━━━━━━━━━━━━━━━━━━━ 0s 795us/step - accuracy: 1.0000 - loss: 0.0194

75/93 ━━━━━━━━━━━━━━━━━━━━ 0s 821us/step - accuracy: 1.0000 - loss: 0.0194
74/93 ━━━━━━━━━━━━━━━━━━━━ 0s 801us/step - accuracy: 1.0000 - loss: 0.0194

69/93 ━━━━━━━━━━━━━━━━━━━━ 0s 786us/step - accuracy: 1.0000 - loss: 0.0193

93/93 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0195 - val_accuracy: 0.7811 - val_loss: 0.8535

This looks about and what we would expect the learning process to be: across epochs, the loss is decreasing and the accuracy is increasing.

A note regarding the learning process of our MLP

Comparable to its architecture, our MLP’s learning process is also not really what you would see on the β€œreal world”. Usually, ANNs are trained way more, for longer periods of times, more epochs and on more data. However, we keep it rather short as we want to enable its application on machines with rather limited computational resources (ie your laptops or binder).

While this is already informative, we can also plot the loss and accuracy in the training and validation sets respectively. Let’s start with the loss.

import matplotlib.pyplot as plt
import seaborn as sns

plt.plot(history.history['loss'], color='m')
plt.plot(history.history['val_loss'], color='c')
plt.title('MLP loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc = 'upper right')

sns.despine(offset=5)

plt.show()
_images/405219be543c0bc8ca500636d7dd0de332c080c183254f175c3161a142e0b5bf.png

And now the same for the accuracy.

import matplotlib.pyplot as plt
import seaborn as sns

plt.plot(history.history['accuracy'], color='m')
plt.plot(history.history['val_accuracy'], color='c')
plt.title('MLP accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc = 'upper left')

sns.despine(offset=5)

plt.show()
_images/306ec0a4ad53323bcdc8b02eb37badd0280b9f5c607d6bec511b29120cafc686.png

How would you interpret these plots…

concerning our MLP’s learning process? Does it make sense? If not, how should it look like? Could you use these plots to evaluate certain aspects of the learning process, e.g. regularization?

Assessing performance#

After evaluating the training of our MLP, we of course also need to evaluate its (predictive) performance. Here, this refers to the accuracy of our MLP’s outcomes, ie its predictions. We already saw this in the above plots and during the training across epochs but let’s check the accuracy of the prediction on the training set again:

from sklearn.metrics import classification_report
y_train_pred = model_mlp.predict(X_train)
print(classification_report(y_train.values.argmax(axis = 1), y_train_pred.argmax(axis=1)))
 1/37 ━━━━━━━━━━━━━━━━━━━━ 1s 33ms/step

37/37 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step 
              precision    recall  f1-score   support

           0       0.87      0.92      0.89        85
           1       0.97      0.98      0.97        88
           2       0.98      0.88      0.92        90
           3       0.99      0.95      0.97        81
           4       0.98      0.96      0.97        91
           5       0.97      0.98      0.98       471
           6       0.89      0.95      0.92        81
           7       0.98      0.98      0.98        90
           8       0.93      0.90      0.92        84

    accuracy                           0.96      1161
   macro avg       0.95      0.94      0.95      1161
weighted avg       0.96      0.96      0.96      1161

Why you might think: β€œOh, that’s awesome, great performance.”, such outcomes are usually perceived as dangerously high and indicate that something is off…

Why should a close-to-perfect performance indicate that something is wrong?

What do you think is the rationale to say that very high scores are actually β€œsuspicious” and tells us that something is most likely wrong? Try thinking about the things you’ve learned so far: training/test/validation datasets and their size, models, predictions, etc. .

Luckily, we did split our dataset into independent training and test sets. So, let’s check our MLP’s performance on the test set:

y_test_pred = model_mlp.predict(X_test)
print(classification_report(y_test.values.argmax(axis = 1), y_test_pred.argmax(axis=1)))
 1/10 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step
 2/10 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step

10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 849us/step
              precision    recall  f1-score   support

           0       0.66      0.83      0.73        23
           1       0.70      0.70      0.70        20
           2       0.71      0.67      0.69        18
           3       0.92      0.89      0.91        27
           4       0.83      0.88      0.86        17
           5       0.92      0.90      0.91       117
           6       0.70      0.70      0.70        27
           7       0.89      0.94      0.92        18
           8       0.62      0.54      0.58        24

    accuracy                           0.82       291
   macro avg       0.77      0.78      0.78       291
weighted avg       0.82      0.82      0.82       291

As you can see, the scores, ie performance, drops quite a bit. Do you know why and which you would report, e.g. in a publication?

Beside checking the overall scores, there are other options to further evaluate our MLP’s (or basically any other model’s) performance. One of the most commonly used ones is called confusion matrix (which you most likely have seen before in this course). A confusion matrix displays how often a given sample was predicted as a certain label, thus, for example, providing insights into differentiability, etc. . To implement this, we initially have to compute the confusion matrix:

import numpy as np
from sklearn.metrics import confusion_matrix

cm_svm = confusion_matrix(y_test.values.argmax(axis = 1), y_test_pred.argmax(axis=1))
model_conf_matrix = cm_svm.astype('float') / cm_svm.sum(axis = 1)[:, np.newaxis]

After that, we can plot it for evaluation.

import pandas as pd
import seaborn as sns

df_cm = pd.DataFrame(model_conf_matrix, index = categories,
                     columns = categories)

plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot = True, cmap = 'Blues', square = True)
plt.xticks(rotation = 45)
plt.title('MLP decoding results - confusion matrix' , fontsize = 15, fontweight = 'bold')
plt.xlabel("true labels", fontsize = 14, fontweight = 'bold')
plt.ylabel("predicted labels", fontsize = 14, fontweight = 'bold')
plt.show()
_images/03b983bd14dcaee5bad51c519698bb364f08497a9859172e1fe975f283a8d38a.png

Based on this outcome: how would you interpret the confusion matrix? Are some categories better "decodable" than others? Could even make such a statement?

Summary#

With that, we already reached the end of this tutorial within which we talked about how to create, train and evaluate a MLP as one possible decoding model that can be applied to brain data. As mentioned before, the MLP utilized here is rather simple and models you see (and maybe use) out in the β€œreal world” will most likely be way more complex. However, their application to brain data concerning input, hidden and output layers follows the same outline.

Tip

Unfortunately, visualizing the features/transformations of an ANN is quite often not straightforward as it depends on the given ANN architecture. However, you can check this fantastic distill article to learn more about feature visualization in artificial neural networks.

Exercises#

  • What is the most difficult category to decode? Why?

  • The model seemed to overfit. Try adding a Dropout layer to regularize the model. You can read about dropout in keras in this blog post.

  • Try to add layers or hidden units, and observe the impact on overfitting and training time.