Brain decoding with MLP
Contents
Brain decoding with MLP¶
This part of the session
aims to make participants
familiar with Multilayer Peceptrons as one possible decoding model
that can be applied to brain data
. The objectives 📍 are:
get to know the basics of
Multilayer Peceptrons
model
creationmodel
training
model
testing
Multilayer Perceptron¶
We are going to train a Multilayer Perceptron
(MLP
) classifier
for brain decoding
on the Haxby dataset. MLP
s are one of the most basic architecture of artificial neural networks. As such, MLP
s consist of input
and output
layers
as well as hidden layers
that process the input
through a succession of transformations
towards the output layer
that performs the task at hand, e.g. a classification
or regression
. Like other machine learning models
for supervised learning
, an MLP
initially goes through a training phase
. During this supervised phase
, the network
is taught what to look for and what is the desired output via its objective function
. This refers to, minimizing the loss
, ie the deviation of predictions
from the “ground truth”, and thus increasing its performance.
MLP
s were actually among the first ANN
s to appear, specifically the Mark I Peceptron which you can see below.
In this tutorial, we are going to train the simplest MLP
architecture featuring one input layer
, one output layer
and just one hidden layer
.
Theoretical motivation¶
The previous tutorial on brain decoding with SVM shows how to use a linear combination of brain features to train a predictor.
Let’s take a moment to consider this: a 1-layer perceptron with a sigmoid activation function
models the relation between X
(the input data) and y
(the predicted data)
the same way a logistic regression would:
\(\hat{y} = \sigma(X \beta + \beta_0)\)
If one optimizes the parameters of this MLP to minimize a cross-entropy loss, they’re actually optimizing for the same objective function as in a classical logistic regression problem: \(\underset{\beta, \beta_0}{\min} \sum_k y_k \log(\hat{y_k}) + (1 - y_k) \log(1 - \hat{y_k})\)
As a rule of thumb, one can consider that a 1-layer perceptron (and therefore any last layer of a multi-layer perceptron) works similarly to an SVC.
A big motivation for using multiple-layer perceptrons is that they can introduce non-linearities in our data. When training such models, the hope is that the hidden layers of the model will find meaningful non-linear combinations of the input features which help us solve our decoding problem.
Getting the data¶
We are going to work with the Haxby dataset [HGF+01] again. You can check the section An overview of the Haxby dataset for more details on that dataset
. Here we are going to quickly download
and prepare it for machine learning applications
with a set of predictive variables
, the brain time series
X
, and a dependent variable
, the respective cognitive processes
/function
/percepts
y
.
import os
import warnings
warnings.filterwarnings(action='once')
from nilearn import datasets
# We are fetching the data for subject 4
data_dir = os.path.join('..', 'data')
sub_no = 4
haxby_dataset = datasets.fetch_haxby(subjects=[sub_no], fetch_stimuli=True, data_dir=data_dir)
func_file = haxby_dataset.func[0]
# mask the data
from nilearn.input_data import NiftiMasker
mask_filename = haxby_dataset.mask_vt[0]
masker = NiftiMasker(mask_img=mask_filename, standardize=True, detrend=True)
X = masker.fit_transform(func_file)
# cognitive annotations
import pandas as pd
behavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=' ')
y = behavioral['labels']
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/nilearn/datasets/func.py:20: DeprecationWarning: Please use `MatReadError` from the `scipy.io.matlab` namespace, the `scipy.io.matlab.miobase` namespace is deprecated.
from scipy.io.matlab.miobase import MatReadError
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/nilearn/datasets/__init__.py:93: FutureWarning: Fetchers from the nilearn.datasets module will be updated in version 0.9 to return python strings instead of bytes and Pandas dataframes instead of Numpy arrays.
warn("Fetchers from the nilearn.datasets module will be "
As an initial check, we’ll have a look at the size of X
and y
:
categories = y.unique()
print(categories)
print(y.shape)
print(X.shape)
['rest' 'face' 'chair' 'scissors' 'shoe' 'scrambledpix' 'house' 'cat'
'bottle']
(1452,)
(1452, 675)
So we have 1452
time points
, with one label
for the respective stimulus percept
each, and for each time point
we have recordings
of brain
activity obtained via fMRI
across 675 voxels
(within the VT
mask
). We can also see that the stimulus percept
s span 9
different categories
.
However, concerning our planned analyses, we need to convert our categories
into a one-hot encoder:
# creating instance of one-hot-encoder
from sklearn.preprocessing import OneHotEncoder
import numpy as np
enc = OneHotEncoder(handle_unknown='ignore')
y_onehot = enc.fit_transform(np.array(y).reshape(-1, 1))
# turn the sparse matrix into a pandas dataframe
y = pd.DataFrame(y_onehot.toarray())
display(y[:10])
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
6 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
9 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Training a model¶
As introduced in the prior tutorials
, one of the most important aspects of machine learning
is the split between train
and tests
. MLP
s are no exception to that and thus we need to split our dataset accordingly. We will keep 20%
of the time points
as test
, and then set up a 10 fold cross validation
for training/validation
.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
With that, we can already build our MLP
. Here, we are going to use Tensorflow and Keras. As with every other ANN
, we need to import
the respective components, here, the model
and layer
type
. In our case we will use a Sequential
model
and Dense
layers
.
from keras.models import Sequential
from keras.layers import Dense
2023-05-22 08:57:18.709929: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-05-22 08:57:18.709957: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
A note regarding our MLP
Please note that the example MLP
we are going to create
and train
here is rather simple as we want to enable its application on machines with rather limited computational resources (ie your laptops or binder). “Real-world” models are usually more complex and might also entail different types
and layers
.
Initially, we need to create our, so far, empty model
.
# number of unique conditions that we have
model_mlp = Sequential()
2023-05-22 08:57:20.117988: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-05-22 08:57:20.118014: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-05-22 08:57:20.118035: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az1102-220): /proc/driver/nvidia/version does not exist
2023-05-22 08:57:20.118486: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Next, we can add the layers
to our model
, starting with the input layer
. Given this is a rather short introduction to the topic and does not focus on ANN
s, we are going to set the kernel initialization
and activation function
to appropriate defaults (Please have a look at the Introduction to deep learning session for more information.).
model_mlp.add(Dense(50 , input_dim = 675, kernel_initializer="uniform", activation = 'relu'))
As noted above, we are using Dense
layers
and as you can see, we set the input dimensions
to 675
. You might have already notices that this is the number of voxels
we have data
from. Setting the input dimension
according to the data dimensions
is rather important is referred to as the semantic gap: the transformation of actions
& percepts
conducted/perceived by human
s into computational representations
. For example, pictures are “nothing” but a huge array
for a computer and what will be submitted to the input layer of an ANN
(note: this also holds true for basically any other type of data
). Here, our MLP
receives the extracted brain activity patterns
as input
which are already in the right array
format thanks to nilearn
. Thus, always carefully think about what your input
data
entails and how it is structured to then setup your input layer
accordingly.
Next, we are going to add one hidden layer
.
model_mlp.add(Dense(30, kernel_initializer="uniform", activation = 'relu'))
And because we are creating a very simple MLP
with only three layers
, we already add our output layer
, using the softmax
activation function
given that we aim to train
our MLP
to predict
the different categories
that were perceived by the participants
from their brain activity patterns
.
model_mlp.add(Dense(len(categories), activation = 'softmax'))
To get a nice overview of our ANN
, we can now use the .summary()
function
, which will provide us with the model type
, model parameters
and for each layer
, the its type
, shape
and parameters
.
model_mlp.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 50) 33800
dense_1 (Dense) (None, 30) 1530
dense_2 (Dense) (None, 9) 279
=================================================================
Total params: 35,609
Trainable params: 35,609
Non-trainable params: 0
_________________________________________________________________
With that, we already created our MLP
architecture
, which is now ready to be compiled
! Within this step, we will set the optimizer
, loss function
and metric
, ie components
that define how our MLP
will learn
.
model_mlp.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
Now it’s to train
our MLP
. Thus, we have to fit
it to our data
, specifically only the training
data
. Here, we are going to provide a few more hyperparameters
that will define how our MLP
is going to learn
. This entails the batch size
, the epochs
and split
of validation sets
. We will assign the respective output to a variable so that we can investigate our MLP
’s learning process
.
history = model_mlp.fit(X_train, y_train, batch_size = 10,
epochs = 10, validation_split = 0.2)
Epoch 1/10
1/93 [..............................] - ETA: 35s - loss: 2.2229 - accuracy: 0.1000
42/93 [============>.................] - ETA: 0s - loss: 1.9819 - accuracy: 0.3429
87/93 [===========================>..] - ETA: 0s - loss: 1.7037 - accuracy: 0.4138
93/93 [==============================] - 1s 3ms/step - loss: 1.6752 - accuracy: 0.4224 - val_loss: 1.4664 - val_accuracy: 0.4506
Epoch 2/10
1/93 [..............................] - ETA: 0s - loss: 1.5577 - accuracy: 0.4000
48/93 [==============>...............] - ETA: 0s - loss: 1.1682 - accuracy: 0.6000
92/93 [============================>.] - ETA: 0s - loss: 1.1279 - accuracy: 0.6217
93/93 [==============================] - 0s 1ms/step - loss: 1.1285 - accuracy: 0.6207 - val_loss: 1.2011 - val_accuracy: 0.5665
Epoch 3/10
1/93 [..............................] - ETA: 0s - loss: 0.3257 - accuracy: 1.0000
47/93 [==============>...............] - ETA: 0s - loss: 0.8161 - accuracy: 0.7447
93/93 [==============================] - ETA: 0s - loss: 0.8159 - accuracy: 0.7392
93/93 [==============================] - 0s 1ms/step - loss: 0.8159 - accuracy: 0.7392 - val_loss: 1.0526 - val_accuracy: 0.6266
Epoch 4/10
1/93 [..............................] - ETA: 0s - loss: 0.2624 - accuracy: 1.0000
46/93 [=============>................] - ETA: 0s - loss: 0.5678 - accuracy: 0.8196
88/93 [===========================>..] - ETA: 0s - loss: 0.5670 - accuracy: 0.8330
93/93 [==============================] - 0s 2ms/step - loss: 0.5575 - accuracy: 0.8394 - val_loss: 0.9237 - val_accuracy: 0.6781
Epoch 5/10
1/93 [..............................] - ETA: 0s - loss: 0.2386 - accuracy: 1.0000
45/93 [=============>................] - ETA: 0s - loss: 0.3832 - accuracy: 0.8844
91/93 [============================>.] - ETA: 0s - loss: 0.3663 - accuracy: 0.8912
93/93 [==============================] - 0s 2ms/step - loss: 0.3661 - accuracy: 0.8912 - val_loss: 0.8652 - val_accuracy: 0.7124
Epoch 6/10
1/93 [..............................] - ETA: 0s - loss: 0.3449 - accuracy: 0.9000
46/93 [=============>................] - ETA: 0s - loss: 0.2661 - accuracy: 0.9261
91/93 [============================>.] - ETA: 0s - loss: 0.2419 - accuracy: 0.9363
93/93 [==============================] - 0s 2ms/step - loss: 0.2418 - accuracy: 0.9364 - val_loss: 0.8276 - val_accuracy: 0.7554
Epoch 7/10
1/93 [..............................] - ETA: 0s - loss: 0.1631 - accuracy: 1.0000
47/93 [==============>...............] - ETA: 0s - loss: 0.1334 - accuracy: 0.9745
91/93 [============================>.] - ETA: 0s - loss: 0.1356 - accuracy: 0.9747
93/93 [==============================] - 0s 2ms/step - loss: 0.1345 - accuracy: 0.9752 - val_loss: 0.7945 - val_accuracy: 0.7639
Epoch 8/10
1/93 [..............................] - ETA: 0s - loss: 0.0720 - accuracy: 1.0000
48/93 [==============>...............] - ETA: 0s - loss: 0.0707 - accuracy: 0.9958
92/93 [============================>.] - ETA: 0s - loss: 0.0737 - accuracy: 0.9946
93/93 [==============================] - 0s 2ms/step - loss: 0.0738 - accuracy: 0.9946 - val_loss: 0.8261 - val_accuracy: 0.7468
Epoch 9/10
1/93 [..............................] - ETA: 0s - loss: 0.1264 - accuracy: 0.9000
48/93 [==============>...............] - ETA: 0s - loss: 0.0799 - accuracy: 0.9833
92/93 [============================>.] - ETA: 0s - loss: 0.0833 - accuracy: 0.9804
93/93 [==============================] - 0s 1ms/step - loss: 0.0827 - accuracy: 0.9806 - val_loss: 0.8526 - val_accuracy: 0.7639
Epoch 10/10
1/93 [..............................] - ETA: 0s - loss: 0.0169 - accuracy: 1.0000
47/93 [==============>...............] - ETA: 0s - loss: 0.0302 - accuracy: 1.0000
93/93 [==============================] - 0s 1ms/step - loss: 0.0321 - accuracy: 0.9989 - val_loss: 0.8581 - val_accuracy: 0.7639
This looks about and what we would expect the learning process
to be: across epochs
, the loss
is decreasing and the accuracy
is increasing.
A note regarding the learning process of our MLP
Comparable to its architecture, our MLP
’s learning process
is also not really what you would see on the “real world”. Usually, ANN
s are trained
way more, for longer periods of times, more epochs
and on more data
. However, we keep it rather short as we want to enable its application on machines with rather limited computational resources (ie your laptops or binder).
While this is already informative, we can also plot the loss
and accuracy
in the training
and validation
sets
respectively. Let’s start with the loss
.
import matplotlib.pyplot as plt
import seaborn as sns
plt.plot(history.history['loss'], color='m')
plt.plot(history.history['val_loss'], color='c')
plt.title('MLP loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc = 'upper right')
sns.despine(offset=5)
plt.show()
And now the same for the accuracy
.
import matplotlib.pyplot as plt
import seaborn as sns
plt.plot(history.history['accuracy'], color='m')
plt.plot(history.history['val_accuracy'], color='c')
plt.title('MLP accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc = 'upper left')
sns.despine(offset=5)
plt.show()
How would you interpret these plots…
concerning our MLP
’s learning process
? Does it make sense? If not, how should it look like? Could you use these plots to evaluate certain aspects of the learning process
, e.g. regularization
?
Assessing performance¶
After evaluating the training
of our MLP
, we of course also need to evaluate its (predictive
) performance
. Here, this refers to the accuracy
of our MLP
’s outcomes, ie its predictions
. We already saw this in the above plots and during the training
across epochs
but let’s check the accuracy
of the prediction
on the training set
again:
from sklearn.metrics import classification_report
y_train_pred = model_mlp.predict(X_train)
print(classification_report(y_train.values.argmax(axis = 1), y_train_pred.argmax(axis=1)))
precision recall f1-score support
0 0.84 0.91 0.87 85
1 0.97 0.97 0.97 88
2 0.99 0.88 0.93 90
3 0.99 0.96 0.97 81
4 0.99 0.95 0.97 91
5 0.97 0.98 0.97 471
6 0.87 0.96 0.91 81
7 0.97 0.97 0.97 90
8 0.93 0.88 0.90 84
accuracy 0.95 1161
macro avg 0.94 0.94 0.94 1161
weighted avg 0.95 0.95 0.95 1161
Why you might think: “Oh, that’s awesome, great performance.”, such outcomes are usually perceived as dangerously high and indicate that something is off…
Why should a close-to-perfect performance indicate that something is wrong?
What do you think is the rationale to say that very high scores
are actually “suspicious” and tells us that something is most likely wrong? Try thinking about the things you’ve learned so far: training
/test
/validation
datasets
and their size, models
, predictions
, etc. .
Luckily, we did split
our dataset
into independent training
and test
sets
. So, let’s check our MLP
’s performance on the test set
:
y_test_pred = model_mlp.predict(X_test)
print(classification_report(y_test.values.argmax(axis = 1), y_test_pred.argmax(axis=1)))
precision recall f1-score support
0 0.61 0.74 0.67 23
1 1.00 0.55 0.71 20
2 0.69 0.61 0.65 18
3 0.93 0.96 0.95 27
4 1.00 0.82 0.90 17
5 0.87 0.91 0.89 117
6 0.67 0.81 0.73 27
7 0.94 0.89 0.91 18
8 0.68 0.62 0.65 24
accuracy 0.82 291
macro avg 0.82 0.77 0.78 291
weighted avg 0.83 0.82 0.82 291
As you can see, the scores
, ie performance
, drops quite a bit. Do you know why and which you would report, e.g. in a publication
?
Beside checking the overall scores
, there are other options to further evaluate our MLP
’s (or basically any other model’s) performance
. One of the most commonly used ones is called confusion matrix
(which you most likely have seen before in this course). A confusion matrix
displays how often a given sample
was predicted
as a certain label
, thus, for example, providing insights into differentiability, etc. . To implement this, we initially have to compute the confusion matrix
:
import numpy as np
from sklearn.metrics import confusion_matrix
cm_svm = confusion_matrix(y_test.values.argmax(axis = 1), y_test_pred.argmax(axis=1))
model_conf_matrix = cm_svm.astype('float') / cm_svm.sum(axis = 1)[:, np.newaxis]
After that, we can plot
it for evaluation.
import pandas as pd
import seaborn as sns
df_cm = pd.DataFrame(model_conf_matrix, index = categories,
columns = categories)
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot = True, cmap = 'Blues', square = True)
plt.xticks(rotation = 45)
plt.title('MLP decoding results - confusion matrix' , fontsize = 15, fontweight = 'bold')
plt.xlabel("true labels", fontsize = 14, fontweight = 'bold')
plt.ylabel("predicted labels", fontsize = 14, fontweight = 'bold')
plt.show()
Based on this outcome: how would you interpret the confusion matrix
? Are some categories
better "decodable"
than others? Could even make such a statement?
Summary¶
With that, we already reached the end of this tutorial
within which we talked about how to create
, train
and evaluate
a MLP
as one possible decoding model
that can be applied to brain data
. As mentioned before, the MLP
utilized here is rather simple and models
you see (and maybe use) out in the “real world” will most likely be way more complex. However, their application to brain data
concerning input
, hidden
and output layers
follows the same outline.
Tip
Unfortunately, visualizing the features/transformations of an ANN
is quite often not straightforward as it depends on the given ANN
architecture. However, you can check this fantastic
distill article to learn more about feature visualization
in artificial neural networks
.