Brain decoding with MLP#
This part of the session
aims to make participants
familiar with Multilayer Peceptrons as one possible decoding model
that can be applied to brain data
. The objectives π are:
get to know the basics of
Multilayer Peceptrons
model
creationmodel
training
model
testing
Multilayer Perceptron#
We are going to train a Multilayer Perceptron
(MLP
) classifier
for brain decoding
on the Haxby dataset. MLP
s are one of the most basic architecture of artificial neural networks. As such, MLP
s consist of input
and output
layers
as well as hidden layers
that process the input
through a succession of transformations
towards the output layer
that performs the task at hand, e.g. a classification
or regression
. Like other machine learning models
for supervised learning
, an MLP
initially goes through a training phase
. During this supervised phase
, the network
is taught what to look for and what is the desired output via its objective function
. This refers to, minimizing the loss
, ie the deviation of predictions
from the βground truthβ, and thus increasing its performance.
MLP
s were actually among the first ANN
s to appear, specifically the Mark I Peceptron which you can see below.
In this tutorial, we are going to train the simplest MLP
architecture featuring one input layer
, one output layer
and just one hidden layer
.
Theoretical motivation#
The previous tutorial on brain decoding with SVM shows how to use a linear combination of brain features to train a predictor.
Letβs take a moment to consider this: a 1-layer perceptron with a sigmoid activation function
models the relation between X
(the input data) and y
(the predicted data)
the same way a logistic regression would:
\(\hat{y} = \sigma(X \beta + \beta_0)\)
If one optimizes the parameters of this MLP to minimize a cross-entropy loss, theyβre actually optimizing for the same objective function as in a classical logistic regression problem: \(\underset{\beta, \beta_0}{\min} \sum_k y_k \log(\hat{y_k}) + (1 - y_k) \log(1 - \hat{y_k})\)
As a rule of thumb, one can consider that a 1-layer perceptron (and therefore any last layer of a multi-layer perceptron) works similarly to an SVC.
A big motivation for using multiple-layer perceptrons is that they can introduce non-linearities in our data. When training such models, the hope is that the hidden layers of the model will find meaningful non-linear combinations of the input features which help us solve our decoding problem.
Getting the data#
We are going to work with the Haxby dataset [HGF+01] again. You can check the section An overview of the Haxby dataset for more details on that dataset
. Here we are going to quickly download
and prepare it for machine learning applications
with a set of predictive variables
, the brain time series
X
, and a dependent variable
, the respective cognitive processes
/function
/percepts
y
.
import os
import warnings
warnings.filterwarnings(action='once')
from nilearn import datasets
# We are fetching the data for subject 4
data_dir = os.path.join('..', 'data')
sub_no = 4
haxby_dataset = datasets.fetch_haxby(subjects=[sub_no], fetch_stimuli=True, data_dir=data_dir)
func_file = haxby_dataset.func[0]
# mask the data
from nilearn.input_data import NiftiMasker
mask_filename = haxby_dataset.mask_vt[0]
masker = NiftiMasker(mask_img=mask_filename, standardize=True, detrend=True)
X = masker.fit_transform(func_file)
# cognitive annotations
import pandas as pd
behavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=' ')
y = behavioral['labels']
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/nilearn/input_data/__init__.py:23: DeprecationWarning: The import path 'nilearn.input_data' is deprecated in version 0.9. Importing from 'nilearn.input_data' will be possible at least until release 0.13.0. Please import from 'nilearn.maskers' instead.
warnings.warn(message, DeprecationWarning)
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/nilearn/image/resampling.py:492: UserWarning: The provided image has no sform in its header. Please check the provided file. Results may not be as expected.
warnings.warn(
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/joblib/memory.py:312: DeprecationWarning: The default strategy for standardize is currently 'zscore' which incorrectly uses population std to calculate sample zscores. The new strategy 'zscore_sample' corrects this behavior by using the sample std. In release 0.13, the default strategy will be replaced by the new strategy and the 'zscore' option will be removed. Please use 'zscore_sample' instead.
return self.func(*args, **kwargs)
As an initial check, weβll have a look at the size of X
and y
:
categories = y.unique()
print(categories)
print(y.shape)
print(X.shape)
['rest' 'face' 'chair' 'scissors' 'shoe' 'scrambledpix' 'house' 'cat'
'bottle']
(1452,)
(1452, 675)
So we have 1452
time points
, with one label
for the respective stimulus percept
each, and for each time point
we have recordings
of brain
activity obtained via fMRI
across 675 voxels
(within the VT
mask
). We can also see that the stimulus percept
s span 9
different categories
.
However, concerning our planned analyses, we need to convert our categories
into a one-hot encoder:
# creating instance of one-hot-encoder
from sklearn.preprocessing import OneHotEncoder
import numpy as np
enc = OneHotEncoder(handle_unknown='ignore')
y_onehot = enc.fit_transform(np.array(y).reshape(-1, 1))
# turn the sparse matrix into a pandas dataframe
y = pd.DataFrame(y_onehot.toarray())
display(y[:10])
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
6 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
9 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Training a model#
As introduced in the prior tutorials
, one of the most important aspects of machine learning
is the split between train
and tests
. MLP
s are no exception to that and thus we need to split our dataset accordingly. We will keep 20%
of the time points
as test
, and then set up a 10 fold cross validation
for training/validation
.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
With that, we can already build our MLP
. Here, we are going to use Tensorflow and Keras. As with every other ANN
, we need to import
the respective components, here, the model
and layer
type
. In our case we will use a Sequential
model
and Dense
layers
.
from keras.models import Sequential
from keras.layers import Dense
2024-10-25 18:10:01.134090: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-25 18:10:01.151639: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-25 18:10:01.151666: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-25 18:10:01.163845: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-25 18:10:02.137285: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
A note regarding our MLP
Please note that the example MLP
we are going to create
and train
here is rather simple as we want to enable its application on machines with rather limited computational resources (ie your laptops or binder). βReal-worldβ models are usually more complex and might also entail different types
and layers
.
Initially, we need to create our, so far, empty model
.
# number of unique conditions that we have
model_mlp = Sequential()
Next, we can add the layers
to our model
, starting with the input layer
. Given this is a rather short introduction to the topic and does not focus on ANN
s, we are going to set the kernel initialization
and activation function
to appropriate defaults (Please have a look at the Introduction to deep learning session for more information.).
model_mlp.add(Dense(50 , input_dim = 675, kernel_initializer="uniform", activation = 'relu'))
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/keras/src/layers/core/dense.py:87: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
super().__init__(activity_regularizer=activity_regularizer, **kwargs)
As noted above, we are using Dense
layers
and as you can see, we set the input dimensions
to 675
. You might have already notices that this is the number of voxels
we have data
from. Setting the input dimension
according to the data dimensions
is rather important is referred to as the semantic gap: the transformation of actions
& percepts
conducted/perceived by human
s into computational representations
. For example, pictures are βnothingβ but a huge array
for a computer and what will be submitted to the input layer of an ANN
(note: this also holds true for basically any other type of data
). Here, our MLP
receives the extracted brain activity patterns
as input
which are already in the right array
format thanks to nilearn
. Thus, always carefully think about what your input
data
entails and how it is structured to then setup your input layer
accordingly.
Next, we are going to add one hidden layer
.
model_mlp.add(Dense(30, kernel_initializer="uniform", activation = 'relu'))
And because we are creating a very simple MLP
with only three layers
, we already add our output layer
, using the softmax
activation function
given that we aim to train
our MLP
to predict
the different categories
that were perceived by the participants
from their brain activity patterns
.
model_mlp.add(Dense(len(categories), activation = 'softmax'))
To get a nice overview of our ANN
, we can now use the .summary()
function
, which will provide us with the model type
, model parameters
and for each layer
, the its type
, shape
and parameters
.
model_mlp.summary()
Model: "sequential"
βββββββββββββββββββββββββββββββββββ³βββββββββββββββββββββββββ³ββββββββββββββββ β Layer (type) β Output Shape β Param # β β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ© β dense (Dense) β (None, 50) β 33,800 β βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββ€ β dense_1 (Dense) β (None, 30) β 1,530 β βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌββββββββββββββββ€ β dense_2 (Dense) β (None, 9) β 279 β βββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ΄ββββββββββββββββ
Total params: 35,609 (139.10 KB)
Trainable params: 35,609 (139.10 KB)
Non-trainable params: 0 (0.00 B)
With that, we already created our MLP
architecture
, which is now ready to be compiled
! Within this step, we will set the optimizer
, loss function
and metric
, ie components
that define how our MLP
will learn
.
model_mlp.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
Now itβs to train
our MLP
. Thus, we have to fit
it to our data
, specifically only the training
data
. Here, we are going to provide a few more hyperparameters
that will define how our MLP
is going to learn
. This entails the batch size
, the epochs
and split
of validation sets
. We will assign the respective output to a variable so that we can investigate our MLP
βs learning process
.
history = model_mlp.fit(X_train, y_train, batch_size = 10,
epochs = 10, validation_split = 0.2)
Epoch 1/10
1/93 ββββββββββββββββββββ 1:05 709ms/step - accuracy: 0.3000 - loss: 2.2071
2/93 ββββββββββββββββββββ 0s 3ms/step - accuracy: 0.2500 - loss: 2.2128
65/93 ββββββββββββββββββββ 0s 828us/step - accuracy: 0.3592 - loss: 1.9566
66/93 ββββββββββββββββββββ 0s 832us/step - accuracy: 0.3603 - loss: 1.9566
93/93 ββββββββββββββββββββ 1s 2ms/step - accuracy: 0.3885 - loss: 1.8757 - val_accuracy: 0.4893 - val_loss: 1.4044
Epoch 2/10
1/93 ββββββββββββββββββββ 1s 14ms/step - accuracy: 0.5000 - loss: 1.1240
3/93 ββββββββββββββββββββ 0s 888us/step - accuracy: 0.5778 - loss: 1.0561
2/93 ββββββββββββββββββββ 0s 875us/step - accuracy: 0.5500 - loss: 1.0561
4/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.5958 - loss: 1.0260
5/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.6047 - loss: 1.0260
67/93 ββββββββββββββββββββ 0s 810us/step - accuracy: 0.5946 - loss: 1.1405
70/93 ββββββββββββββββββββ 0s 826us/step - accuracy: 0.5957 - loss: 1.1390
69/93 ββββββββββββββββββββ 0s 817us/step - accuracy: 0.5953 - loss: 1.1390
68/93 ββββββββββββββββββββ 0s 818us/step - accuracy: 0.5949 - loss: 1.1395
93/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.6044 - loss: 1.1212 - val_accuracy: 0.5708 - val_loss: 1.1577
Epoch 3/10
1/93 ββββββββββββββββββββ 1s 14ms/step - accuracy: 0.8000 - loss: 0.7143
2/93 ββββββββββββββββββββ 0s 921us/step - accuracy: 0.7750 - loss: 0.7143
3/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.7722 - loss: 0.7554
4/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.7729 - loss: 0.7389
71/93 ββββββββββββββββββββ 0s 791us/step - accuracy: 0.7593 - loss: 0.7323
70/93 ββββββββββββββββββββ 0s 790us/step - accuracy: 0.7592 - loss: 0.7323
93/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.7602 - loss: 0.7317 - val_accuracy: 0.6524 - val_loss: 1.0112
Epoch 4/10
2/93 ββββββββββββββββββββ 0s 835us/step - accuracy: 0.7750 - loss: 0.4939
1/93 ββββββββββββββββββββ 1s 14ms/step - accuracy: 0.7000 - loss: 0.4939
3/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.8056 - loss: 0.4499
67/93 ββββββββββββββββββββ 0s 777us/step - accuracy: 0.8609 - loss: 0.4378
68/93 ββββββββββββββββββββ 0s 779us/step - accuracy: 0.8607 - loss: 0.4378
69/93 ββββββββββββββββββββ 0s 791us/step - accuracy: 0.8605 - loss: 0.4385
93/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.8567 - loss: 0.4485 - val_accuracy: 0.7039 - val_loss: 0.8941
Epoch 5/10
2/93 ββββββββββββββββββββ 0s 796us/step - accuracy: 0.9250 - loss: 0.3000
1/93 ββββββββββββββββββββ 1s 14ms/step - accuracy: 0.9000 - loss: 0.3000
4/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.9354 - loss: 0.2980
3/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.9389 - loss: 0.2698
67/93 ββββββββββββββββββββ 0s 793us/step - accuracy: 0.9271 - loss: 0.2995
68/93 ββββββββββββββββββββ 0s 801us/step - accuracy: 0.9271 - loss: 0.2995
93/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.9268 - loss: 0.2988 - val_accuracy: 0.6996 - val_loss: 0.8856
Epoch 6/10
1/93 ββββββββββββββββββββ 1s 15ms/step - accuracy: 1.0000 - loss: 0.1164
3/93 ββββββββββββββββββββ 0s 2ms/step - accuracy: 0.9572 - loss: 0.1450
5/93 ββββββββββββββββββββ 0s 885us/step - accuracy: 0.9620 - loss: 0.1450
2/93 ββββββββββββββββββββ 0s 882us/step - accuracy: 0.9750 - loss: 0.1361
4/93 ββββββββββββββββββββ 0s 855us/step - accuracy: 0.9625 - loss: 0.1450
6/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.9604 - loss: 0.1459
7/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.9615 - loss: 0.1477
70/93 ββββββββββββββββββββ 0s 850us/step - accuracy: 0.9716 - loss: 0.1639
71/93 ββββββββββββββββββββ 0s 851us/step - accuracy: 0.9716 - loss: 0.1639
72/93 ββββββββββββββββββββ 0s 851us/step - accuracy: 0.9717 - loss: 0.1639
75/93 ββββββββββββββββββββ 0s 886us/step - accuracy: 0.9718 - loss: 0.1642
73/93 ββββββββββββββββββββ 0s 856us/step - accuracy: 0.9718 - loss: 0.1641
74/93 ββββββββββββββββββββ 0s 877us/step - accuracy: 0.9718 - loss: 0.1642
93/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.9714 - loss: 0.1661 - val_accuracy: 0.7382 - val_loss: 0.8413
Epoch 7/10
2/93 ββββββββββββββββββββ 0s 797us/step - accuracy: 1.0000 - loss: 0.0912
1/93 ββββββββββββββββββββ 1s 14ms/step - accuracy: 1.0000 - loss: 0.0872
5/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0893
4/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0893
3/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0893
67/93 ββββββββββββββββββββ 0s 803us/step - accuracy: 0.9842 - loss: 0.1223
66/93 ββββββββββββββββββββ 0s 802us/step - accuracy: 0.9842 - loss: 0.1224
70/93 ββββββββββββββββββββ 0s 847us/step - accuracy: 0.9843 - loss: 0.1218
68/93 ββββββββββββββββββββ 0s 808us/step - accuracy: 0.9842 - loss: 0.1223
69/93 ββββββββββββββββββββ 0s 833us/step - accuracy: 0.9843 - loss: 0.1218
71/93 ββββββββββββββββββββ 0s 827us/step - accuracy: 0.9843 - loss: 0.1216
93/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.9848 - loss: 0.1189 - val_accuracy: 0.7339 - val_loss: 0.9226
Epoch 8/10
1/93 ββββββββββββββββββββ 1s 14ms/step - accuracy: 1.0000 - loss: 0.0806
2/93 ββββββββββββββββββββ 0s 836us/step - accuracy: 0.9750 - loss: 0.0806
3/93 ββββββββββββββββββββ 0s 2ms/step - accuracy: 0.9722 - loss: 0.0930
4/93 ββββββββββββββββββββ 0s 2ms/step - accuracy: 0.9729 - loss: 0.0939
71/93 ββββββββββββββββββββ 0s 791us/step - accuracy: 0.9871 - loss: 0.0729
74/93 ββββββββββββββββββββ 0s 799us/step - accuracy: 0.9874 - loss: 0.0726
72/93 ββββββββββββββββββββ 0s 803us/step - accuracy: 0.9872 - loss: 0.0726
73/93 ββββββββββββββββββββ 0s 804us/step - accuracy: 0.9873 - loss: 0.0724
93/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.9890 - loss: 0.0705 - val_accuracy: 0.7768 - val_loss: 0.8128
Epoch 9/10
1/93 ββββββββββββββββββββ 1s 14ms/step - accuracy: 1.0000 - loss: 0.0294
2/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0292
3/93 ββββββββββββββββββββ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0325
71/93 ββββββββββββββββββββ 0s 803us/step - accuracy: 0.9998 - loss: 0.0350
73/93 ββββββββββββββββββββ 0s 792us/step - accuracy: 0.9998 - loss: 0.0350
72/93 ββββββββββββββββββββ 0s 781us/step - accuracy: 0.9999 - loss: 0.0350
75/93 ββββββββββββββββββββ 0s 794us/step - accuracy: 0.9998 - loss: 0.0350
77/93 ββββββββββββββββββββ 0s 805us/step - accuracy: 0.9997 - loss: 0.0350
76/93 ββββββββββββββββββββ 0s 808us/step - accuracy: 0.9997 - loss: 0.0350
74/93 ββββββββββββββββββββ 0s 809us/step - accuracy: 0.9998 - loss: 0.0350
70/93 ββββββββββββββββββββ 0s 781us/step - accuracy: 0.9999 - loss: 0.0349
93/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 0.9996 - loss: 0.0349 - val_accuracy: 0.7682 - val_loss: 0.8698
Epoch 10/10
2/93 ββββββββββββββββββββ 0s 883us/step - accuracy: 1.0000 - loss: 0.0191
1/93 ββββββββββββββββββββ 1s 14ms/step - accuracy: 1.0000 - loss: 0.0218
3/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0191
68/93 ββββββββββββββββββββ 0s 785us/step - accuracy: 1.0000 - loss: 0.0193
73/93 ββββββββββββββββββββ 0s 794us/step - accuracy: 1.0000 - loss: 0.0194
70/93 ββββββββββββββββββββ 0s 796us/step - accuracy: 1.0000 - loss: 0.0194
72/93 ββββββββββββββββββββ 0s 793us/step - accuracy: 1.0000 - loss: 0.0194
71/93 ββββββββββββββββββββ 0s 795us/step - accuracy: 1.0000 - loss: 0.0194
75/93 ββββββββββββββββββββ 0s 821us/step - accuracy: 1.0000 - loss: 0.0194
74/93 ββββββββββββββββββββ 0s 801us/step - accuracy: 1.0000 - loss: 0.0194
69/93 ββββββββββββββββββββ 0s 786us/step - accuracy: 1.0000 - loss: 0.0193
93/93 ββββββββββββββββββββ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0195 - val_accuracy: 0.7811 - val_loss: 0.8535
This looks about and what we would expect the learning process
to be: across epochs
, the loss
is decreasing and the accuracy
is increasing.
A note regarding the learning process of our MLP
Comparable to its architecture, our MLP
βs learning process
is also not really what you would see on the βreal worldβ. Usually, ANN
s are trained
way more, for longer periods of times, more epochs
and on more data
. However, we keep it rather short as we want to enable its application on machines with rather limited computational resources (ie your laptops or binder).
While this is already informative, we can also plot the loss
and accuracy
in the training
and validation
sets
respectively. Letβs start with the loss
.
import matplotlib.pyplot as plt
import seaborn as sns
plt.plot(history.history['loss'], color='m')
plt.plot(history.history['val_loss'], color='c')
plt.title('MLP loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc = 'upper right')
sns.despine(offset=5)
plt.show()
And now the same for the accuracy
.
import matplotlib.pyplot as plt
import seaborn as sns
plt.plot(history.history['accuracy'], color='m')
plt.plot(history.history['val_accuracy'], color='c')
plt.title('MLP accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc = 'upper left')
sns.despine(offset=5)
plt.show()
How would you interpret these plotsβ¦
concerning our MLP
βs learning process
? Does it make sense? If not, how should it look like? Could you use these plots to evaluate certain aspects of the learning process
, e.g. regularization
?
Assessing performance#
After evaluating the training
of our MLP
, we of course also need to evaluate its (predictive
) performance
. Here, this refers to the accuracy
of our MLP
βs outcomes, ie its predictions
. We already saw this in the above plots and during the training
across epochs
but letβs check the accuracy
of the prediction
on the training set
again:
from sklearn.metrics import classification_report
y_train_pred = model_mlp.predict(X_train)
print(classification_report(y_train.values.argmax(axis = 1), y_train_pred.argmax(axis=1)))
1/37 ββββββββββββββββββββ 1s 33ms/step
37/37 ββββββββββββββββββββ 0s 1ms/step
precision recall f1-score support
0 0.87 0.92 0.89 85
1 0.97 0.98 0.97 88
2 0.98 0.88 0.92 90
3 0.99 0.95 0.97 81
4 0.98 0.96 0.97 91
5 0.97 0.98 0.98 471
6 0.89 0.95 0.92 81
7 0.98 0.98 0.98 90
8 0.93 0.90 0.92 84
accuracy 0.96 1161
macro avg 0.95 0.94 0.95 1161
weighted avg 0.96 0.96 0.96 1161
Why you might think: βOh, thatβs awesome, great performance.β, such outcomes are usually perceived as dangerously high and indicate that something is offβ¦
Why should a close-to-perfect performance indicate that something is wrong?
What do you think is the rationale to say that very high scores
are actually βsuspiciousβ and tells us that something is most likely wrong? Try thinking about the things youβve learned so far: training
/test
/validation
datasets
and their size, models
, predictions
, etc. .
Luckily, we did split
our dataset
into independent training
and test
sets
. So, letβs check our MLP
βs performance on the test set
:
y_test_pred = model_mlp.predict(X_test)
print(classification_report(y_test.values.argmax(axis = 1), y_test_pred.argmax(axis=1)))
1/10 ββββββββββββββββββββ 0s 12ms/step
2/10 ββββββββββββββββββββ 0s 1ms/step
10/10 ββββββββββββββββββββ 0s 849us/step
precision recall f1-score support
0 0.66 0.83 0.73 23
1 0.70 0.70 0.70 20
2 0.71 0.67 0.69 18
3 0.92 0.89 0.91 27
4 0.83 0.88 0.86 17
5 0.92 0.90 0.91 117
6 0.70 0.70 0.70 27
7 0.89 0.94 0.92 18
8 0.62 0.54 0.58 24
accuracy 0.82 291
macro avg 0.77 0.78 0.78 291
weighted avg 0.82 0.82 0.82 291
As you can see, the scores
, ie performance
, drops quite a bit. Do you know why and which you would report, e.g. in a publication
?
Beside checking the overall scores
, there are other options to further evaluate our MLP
βs (or basically any other modelβs) performance
. One of the most commonly used ones is called confusion matrix
(which you most likely have seen before in this course). A confusion matrix
displays how often a given sample
was predicted
as a certain label
, thus, for example, providing insights into differentiability, etc. . To implement this, we initially have to compute the confusion matrix
:
import numpy as np
from sklearn.metrics import confusion_matrix
cm_svm = confusion_matrix(y_test.values.argmax(axis = 1), y_test_pred.argmax(axis=1))
model_conf_matrix = cm_svm.astype('float') / cm_svm.sum(axis = 1)[:, np.newaxis]
After that, we can plot
it for evaluation.
import pandas as pd
import seaborn as sns
df_cm = pd.DataFrame(model_conf_matrix, index = categories,
columns = categories)
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot = True, cmap = 'Blues', square = True)
plt.xticks(rotation = 45)
plt.title('MLP decoding results - confusion matrix' , fontsize = 15, fontweight = 'bold')
plt.xlabel("true labels", fontsize = 14, fontweight = 'bold')
plt.ylabel("predicted labels", fontsize = 14, fontweight = 'bold')
plt.show()
Based on this outcome: how would you interpret the confusion matrix
? Are some categories
better "decodable"
than others? Could even make such a statement?
Summary#
With that, we already reached the end of this tutorial
within which we talked about how to create
, train
and evaluate
a MLP
as one possible decoding model
that can be applied to brain data
. As mentioned before, the MLP
utilized here is rather simple and models
you see (and maybe use) out in the βreal worldβ will most likely be way more complex. However, their application to brain data
concerning input
, hidden
and output layers
follows the same outline.
Tip
Unfortunately, visualizing the features/transformations of an ANN
is quite often not straightforward as it depends on the given ANN
architecture. However, you can check this fantastic
distill article to learn more about feature visualization
in artificial neural networks
.
Exercises#
What is the most difficult category to decode? Why?
The model seemed to overfit. Try adding a
Dropout
layer to regularize the model. You can read about dropout in keras in this blog post.Try to add layers or hidden units, and observe the impact on overfitting and training time.