Hide code cell source
# Let's keep our notebook clean, so it's a little more readable!
import warnings
warnings.filterwarnings('ignore')

Predict age from resting state fMRI with scikit-learn#

We will integrate what we’ve learned in the previous sections to extract data from several resting state fMRI images, and use that data as features in a machine learning model.

The dataset consists of children (ages 3-13) and young adults (ages 18-39). We will use resting state fMRI data to try to predict who are adults and who are children.

Load the data#

# change this to the location where you want the data to get downloaded
data_dir = './nilearn_data'

# Now fetch the data
from nilearn import datasets
development_dataset = datasets.fetch_development_fmri(
                                                      data_dir=data_dir,
                                                      reduce_confounds = False
                                                    )

data = development_dataset.func
confounds = development_dataset.confounds
Hide code cell output
Downloading data from https://osf.io/download/5c8ff3e14712b400183b7097/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3e32286e80018c3e42c/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3e4a743a9001760814f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3e54712b400183b70a5/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3e52286e80018c3e439/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3e72286e80017c41b3d/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3e9a743a90017608158/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3e82286e80018c3e443/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ea4712b400183b70b7/ ...
 ...done. (0 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3eb2286e80019c3c194/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3eb2286e80019c3c198/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ed2286e80017c41b56/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ee2286e80016c3c379/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ee4712b400183b70c3/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3efa743a9001660a0d5/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3f14712b4001a3b560e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3f1a743a90017608164/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3f12286e80016c3c37e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3f34712b4001a3b5612/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3f7a743a90019606cdf/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3f6a743a90017608171/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3f64712b400183b70d8/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3f72286e80019c3c1af/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3f92286e80018c3e463/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4534712b400183b716d/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3fb2286e80017c41b72/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3fb2286e80019c3c1b3/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3fd4712b400183b70e6/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3fe4712b4001a3b5620/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ff4712b400173b5399/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff401a743a9001660a104/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff403a743a90017608181/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4034712b400183b70f6/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4042286e80019c3c1c2/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4052286e80017c41b92/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4064712b400183b70fe/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4074712b400183b7104/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff40aa743a9001660a119/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4092286e80017c41ba7/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff40b2286e80016c3c39a/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff40d2286e80016c3c39f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff40da743a90018606eac/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff40e4712b400173b53a8/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4104712b400173b53ad/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4112286e80016c3c3a5/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff412a743a9001660a128/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff414a743a90019606cfc/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff416a743a90019606d01/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff417a743a9001660a130/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4184712b400193b5c19/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff41a2286e80019c3c1de/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff41aa743a9001660a13b/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff41b2286e80016c3c3b6/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff41d2286e80018c3e499/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff41da743a900176081a2/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff41ea743a90018606ec7/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4202286e80019c3c1e2/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4212286e80018c3e49d/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4212286e80019c3c1e6/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff424a743a900176081af/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4264712b400193b5c2f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4252286e80017c41bfc/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4282286e80017c41c0a/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff4292286e80017c41c0f/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3822286e80018c3e37b/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff382a743a90018606df8/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3814712b4001a3b5561/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3832286e80016c3c2d1/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3842286e80017c419e0/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3854712b4001a3b5568/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4702f39926900171090ee/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e8b353c58001c9abe98/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3872286e80017c419ea/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3872286e80017c419e9/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3884712b400183b7023/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3884712b400193b5b5c/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff389a743a9001660a016/ ...
 ...done. (0 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff38c2286e80016c3c2da/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff38ca743a90018606dfe/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff38ca743a9001760809e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47056353c58001c9ac064/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e5af2be3c001801f799/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4703bf2be3c001801fa49/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e92a3bc970019f0717f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff38c4712b4001a3b5573/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff38da743a900176080a2/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47016a3bc970017efe44f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e43f2be3c0017056b8a/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb470413992690018133d8c/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e9a353c58001c9abeac/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff38f2286e80018c3e38d/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3914712b4001a3b5579/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4702a353c58001b9cb5ae/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e9b39926900190fad5c/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff391a743a900176080a9/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3914712b400173b5329/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47023353c58001c9ac02b/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46eaa39926900160f69af/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3912286e80018c3e393/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3952286e80017c41a1b/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47045a3bc970019f073a0/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e913992690018133b1c/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47052f2be3c0017057069/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e5c353c5800199ac79f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47057f2be3c0019030a1f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e63f2be3c0017056ba9/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4704af2be3c001705703b/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e7a353c58001a9b3324/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3952286e80016c3c2e7/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3954712b400193b5b79/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47016a3bc970018f1fc88/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e6ba3bc970019f07152/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff395a743a900176080af/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3964712b400193b5b7d/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff399a743a9001660a031/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3982286e80017c41a29/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff39aa743a90018606e21/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff39aa743a900176080ba/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb470153992690018133d3b/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e793992690017108eb9/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47038353c5800199ac9a2/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e85a3bc97001aeff750/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4701c3992690018133d49/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e1c3992690018133a9e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff39aa743a900176080bf/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff39d4712b400193b5b89/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4703039926900160f6b3e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e4d353c58001b9cb325/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4700af2be3c0017056f69/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e0cf2be3c001801f757/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4702b39926900171090e4/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e35f2be3c00190305ff/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff39ca743a90019606c50/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3a2a743a9001660a048/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4701ff2be3c0017056fad/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e0339926900160f6930/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3a12286e80017c41a48/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3a12286e80016c3c2fc/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff39fa743a90018606e2f/ ...
 ...done. (0 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3a34712b4001a3b55a3/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4703439926900160f6b43/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e40f2be3c001801f77f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3a34712b400193b5b92/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3a84712b400183b7048/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47020f2be3c0019030968/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e6f353c58001a9b3311/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3a72286e80017c41a54/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3a7a743a90018606e42/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4702639926900190faf1d/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e3f353c5800199ac787/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47049353c5800199ac9b4/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46eaa353c58001c9abebb/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3a74712b4001a3b55ad/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3a72286e80017c41a59/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3aa4712b400183b704d/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ac4712b4001a3b55b7/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47051f2be3c001601df24/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e12f2be3c001801f75e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3aca743a9001660a063/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ac4712b400183b7051/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47032a3bc970019f07386/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e42353c58001b9cb311/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47036f2be3c001801fa3d/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e6539926900190fad0c/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47057353c58001a9b353f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46ea4353c58001b9cb3a6/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4703af2be3c001601def7/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e51f2be3c001801f78e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ae4712b400183b7055/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3af2286e80018c3e3c0/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b02286e80018c3e3c4/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b14712b400183b705a/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47029f2be3c0019030994/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e17f2be3c00190305da/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47047f2be3c0017057034/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e23a3bc970018f1fa00/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b12286e80016c3c30f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b34712b400183b7060/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b2a743a9001660a07a/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b54712b400193b5ba3/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c9e99d006cd47001a5ab599/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b62286e80016c3c31b/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47011f2be3c001903092f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e54353c58001a9b32f3/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b72286e80017c41a88/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b94712b4001a3b55bf/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47039a3bc970018f1fcbf/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e983992690017108ed8/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b92286e80017c41a8e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3b92286e80018c3e3e0/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ba2286e80016c3c325/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3bd2286e80017c41a9e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4700bf2be3c001801f9c3/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e22f2be3c0017056b52/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47019a3bc970017efe457/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e1fa3bc970018f1f9f5/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb470313992690018133d6d/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e85f2be3c001601dc65/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3be4712b400193b5bab/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3bf2286e80017c41aa8/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3be4712b4001a3b55c4/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3c12286e80017c41ab1/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3c34712b400173b5362/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3c42286e80017c41ab6/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47010f2be3c0017056f80/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46ea03992690017108ee8/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47022f2be3c0017056fb9/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e30353c58001b9cb2f5/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3c44712b400183b7071/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3c42286e80017c41abc/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4700cf2be3c0017056f70/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e4fa3bc970019f0713b/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4702af2be3c001601debb/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e75353c58001a9b331b/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3c7a743a90018606e5f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3c9a743a90017608120/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47019f2be3c0019030945/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e5ba3bc97001aeff72b/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4702cf2be3c0017056fdb/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e4b353c5800199ac78f/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4701a39926900171090bb/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46ea7353c58001b9cb3ac/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47010a3bc970019f0735c/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e8e3992690017108ed0/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb470423992690018133d92/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e46a3bc970018f1fa36/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4700df2be3c001801f9c8/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e4f353c58001a9b32e9/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4702ff2be3c00190309b0/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e8af2be3c0017056be0/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47047f2be3c001801fa64/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46ea43992690017108ef5/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3c94712b4001a3b55d3/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3c9a743a9001760811a/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ca4712b400183b707a/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3cc2286e80017c41adc/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47027f2be3c0017056fd0/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e38a3bc970018f1fa25/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47028f2be3c001801fa13/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e2fa3bc970018f1fa11/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47050f2be3c0017057062/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e77f2be3c001601dc4f/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4701e39926900171090c7/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e82f2be3c001903063e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47015f2be3c001801f9df/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e8139926900160f698a/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47013f2be3c0019030935/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e65353c58001b9cb346/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3cea743a90019606c9f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3cea743a90018606e68/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47025f2be3c001801fa04/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e09a3bc970018f1f9d8/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47044a3bc970018f1fccc/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e29f2be3c0017056b6a/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47042f2be3c0017057025/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e64a3bc970019f0714b/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4705239926900160f6b74/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e00f2be3c001601dbf3/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4704d353c58001b9cb5d8/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46dfff2be3c001601dbef/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3ce2286e80016c3c34b/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3d12286e80019c3c16f/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47010f2be3c0019030921/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e0539926900160f6935/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47052353c58001b9cb5e3/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46df9f2be3c0017056b01/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4700d353c58001b9cb57d/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46dfef2be3c0017056b15/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb470383992690018133d76/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46dfc3992690018133a72/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4703ef2be3c00190309d3/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e0ea3bc970019f0710c/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3d3a743a90019606caa/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3d34712b400193b5bc7/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3d22286e80017c41af2/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3d52286e80017c41afe/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4704ef2be3c001801fa79/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e7339926900190fad1a/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4700fa3bc97001aeff8f0/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e16f2be3c0017056b3e/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3d84712b400183b708c/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3d7a743a90017608138/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4704cf2be3c0017057049/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e34a3bc97001aeff717/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb470553992690018133dbe/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e29f2be3c001601dc17/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3d8a743a90019606cb5/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3d8a743a90018606e75/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3dba743a90018606e7e/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3de4712b4001a3b55f4/ ...
 ...done. (2 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47017f2be3c0017056f8d/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e13f2be3c0017056b37/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4704ca3bc970018f1fcda/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e0b39926900160f693c/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb4704c3992690018133da6/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e1c3992690018133aa2/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3dc4712b4001a3b55f0/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5c8ff3df2286e80018c3e421/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb47021353c58001b9cb59f/ ...
 ...done. (1 seconds, 0 min)
Downloading data from https://osf.io/download/5cb46e2f353c58001b9cb2f1/ ...
 ...done. (1 seconds, 0 min)

How many individual subjects do we have?

len(data)
155

Get Y (our target) and assess its distribution#

# Let's load the phenotype data
import pandas as pd

pheno = pd.DataFrame(development_dataset.phenotypic)
pheno.head(4)
participant_id Age AgeGroup Child_Adult Gender Handedness
0 sub-pixar123 27.06 Adult adult F R
1 sub-pixar124 33.44 Adult adult M R
2 sub-pixar125 31.00 Adult adult M R
3 sub-pixar126 19.00 Adult adult F R

Looks like there is a column labeling children and adults. Let’s capture it in a variable

y_ageclass = pheno['Child_Adult']
y_ageclass.unique()
array(['adult', 'child'], dtype=object)

Let’s have a look at the distribution of our target variable

import matplotlib.pyplot as plt
import seaborn as sns
sns.countplot(x=y_ageclass)
pheno.Child_Adult.value_counts()
Child_Adult
child    122
adult     33
Name: count, dtype: int64
_images/a72c02dcadbc23b2ed1d1b732e718998618b5623eff6c5f227f4433230e78aea.png

This is very unbalanced – there seems to be many more children than adults. It is something we can accomodate to a degree when training our model, but it is not within the scope of this tutorial. So let’s select an arbitrary subset of the children to match the number of adults. As the 32 adults are at the beginning of the frame, this is easy to do:

data = data[0:66]
pheno = pheno.head(66)
y_ageclass = pheno['Child_Adult']

Extract features with nilearn masker#

Here, we are going to use the same techniques we learned in the previous tutorial to extract resting state fMRI connectivity features from every subject. Let’s reload our atlas, and re-initiate our masker and correlation_measure.

from nilearn.input_data import NiftiLabelsMasker
from nilearn.connectome import ConnectivityMeasure

# load atlas
multiscale = datasets.fetch_atlas_basc_multiscale_2015(data_dir=data_dir)
atlas_filename = multiscale.scale064

# initialize masker (change verbosity)
masker = NiftiLabelsMasker(labels_img=atlas_filename, standardize=True,
                           memory='nilearn_cache', resampling_target="data",
                           detrend=True, verbose=0)

# initialize correlation measure, set to vectorize
correlation_measure = ConnectivityMeasure(kind='correlation', vectorize=True,
                                         discard_diagonal=True)
Hide code cell output
Dataset created in ./nilearn_data/basc_multiscale_2015

Downloading data from https://ndownloader.figshare.com/files/1861819 ...
 ...done. (1 seconds, 0 min)
Extracting data from ./nilearn_data/basc_multiscale_2015/53337d5c408465aa257d35f81c13413b/1861819..... done.

Okay – now that we have that taken care of, let’s load all of the data!

NOTE: On a laptop, this might take a few minutes.

all_features = [] # here is where we will put the data (a container)

for i,sub in enumerate(data):
    # extract the timeseries from the ROIs in the atlas
    time_series = masker.fit_transform(sub, confounds=confounds[i])
    # create a region x region correlation matrix
    correlation_matrix = correlation_measure.fit_transform([time_series])[0]
    # add to our container
    all_features.append(correlation_matrix)
    # keep track of status
    print('finished %s of %s'%(i+1,len(data)))
Hide code cell output
finished 1 of 66
finished 2 of 66
finished 3 of 66
finished 4 of 66
finished 5 of 66
finished 6 of 66
finished 7 of 66
finished 8 of 66
finished 9 of 66
finished 10 of 66
finished 11 of 66
finished 12 of 66
finished 13 of 66
finished 14 of 66
finished 15 of 66
finished 16 of 66
finished 17 of 66
finished 18 of 66
finished 19 of 66
finished 20 of 66
finished 21 of 66
finished 22 of 66
finished 23 of 66
finished 24 of 66
finished 25 of 66
finished 26 of 66
finished 27 of 66
finished 28 of 66
finished 29 of 66
finished 30 of 66
finished 31 of 66
finished 32 of 66
finished 33 of 66
finished 34 of 66
finished 35 of 66
finished 36 of 66
finished 37 of 66
finished 38 of 66
finished 39 of 66
finished 40 of 66
finished 41 of 66
finished 42 of 66
finished 43 of 66
finished 44 of 66
finished 45 of 66
finished 46 of 66
finished 47 of 66
finished 48 of 66
finished 49 of 66
finished 50 of 66
finished 51 of 66
finished 52 of 66
finished 53 of 66
finished 54 of 66
finished 55 of 66
finished 56 of 66
finished 57 of 66
finished 58 of 66
finished 59 of 66
finished 60 of 66
finished 61 of 66
finished 62 of 66
finished 63 of 66
finished 64 of 66
finished 65 of 66
finished 66 of 66
# Let's save the data to disk
import numpy as np

np.savez_compressed('data/MAIN_BASC064_subsamp_features', a=all_features)

In case you do not want to run the full loop on your computer, you can load the output of the loop here!

feat_file = 'data/MAIN_BASC064_subsamp_features.npz'
X_features = np.load(feat_file)['a']
X_features.shape
(66, 2016)

Okay so we’ve got our features.

We can visualize our feature matrix

import matplotlib.pyplot as plt

plt.imshow(X_features, aspect='auto', interpolation='nearest')
plt.colorbar()
plt.title('feature matrix')
plt.xlabel('features')
plt.ylabel('subjects')
Text(0, 0.5, 'subjects')
_images/443e5f730176ea5a34cd6badacfc170d3e28dde082267e0970c729f526fba0f2.png

Prepare data for machine learning#

Here, we will define a training sample where we can play around with our models. We will also set aside a test sample that we will not touch until the end.

We want to be sure that our training and test sample are matched! We can do that with a stratified split. Specifically, we will stratify by age class.

y_ageclass.shape
(66,)
from sklearn.model_selection import train_test_split

# Split the sample to training/test and
# stratify by age class, and also shuffle the data.

X_train, X_test, y_train, y_test = train_test_split(X_features, # x
                                                    y_ageclass, # y
                                                    test_size = 0.2, # 80%/20% split  
                                                    shuffle = True, # shuffle dataset
                                                                    # before splitting
                                                    stratify = y_ageclass, # keep
                                                                           # distribution
                                                                           # of ageclass
                                                                           # consistent
                                                                           # betw. train
                                                                           # & test sets.
                                                    random_state = 123 # same shuffle each
                                                                       # time
                                                    )

# print the size of our training and test groups
print('training:', len(X_train),
     'testing:', len(X_test))
training: 52 testing: 14

Let’s visualize the distributions to be sure they are matched

fig,(ax1,ax2) = plt.subplots(2)
sns.countplot(x=y_train, ax=ax1, order=['child','adult'])
ax1.set_title('Train')
sns.countplot(x=y_test, ax=ax2, order=['child','adult'])
ax2.set_title('Test')
plt.tight_layout()
_images/bf84eca0daa0fd9003772fa32457723538603842df845e45d59485ac4bb50908.png

Run your first model!#

Machine learning can get pretty fancy very quickly. We’ll start with a very standard classification model called a Support Vector Classifier (SVC).

While this may seem unambitious, simple models can be very robust. And we don’t have enough data to create more complex models.

For more information, see this excellent resource: https://hal.inria.fr/hal-01824205

First, a quick review of SVM!

Let’s fit our first model!

from sklearn.svm import SVC
l_svc = SVC(kernel='linear', class_weight='balanced') # define the model

l_svc.fit(X_train, y_train) # fit the model
SVC(class_weight='balanced', kernel='linear')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Well… that was easy. Let’s see how well the model learned the data!

We can judge our model on several criteria:

  • Accuracy: The proportion of predictions that were correct overall

  • Precision: Accuracy of cases predicted as positive

  • Recall: Number of true positives correctly predicted to be positive

  • f1 score: A balance between precision and recall

Or, for a more visual explanation…

Let’s train a model#

from sklearn.metrics import classification_report, confusion_matrix, precision_score, f1_score

# predict the training data based on the model
y_pred = l_svc.predict(X_train)

# calculate the model accuracy
acc = l_svc.score(X_train, y_train)

# calculate the model precision, recall and f1, all in one convenient report!
cr = classification_report(y_true=y_train,
                      y_pred = y_pred)

# get a table to help us break down these scores
cm = confusion_matrix(y_true=y_train, y_pred = y_pred)

Let’s view our results and plot them all at once!

import itertools
from pandas import DataFrame

# print results
print('accuracy:', acc)
print(cr)

# plot confusion matrix
cmdf = DataFrame(cm, index = ['Adult','Child'], columns = ['Adult','Child'])
sns.heatmap(cmdf, cmap = 'RdBu_r')
plt.xlabel('Predicted')
plt.ylabel('Observed')
# label cells in matrix
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j+0.5, i+0.5, format(cm[i, j], 'd'),
                 horizontalalignment="center",
                 color="white")
accuracy: 1.0
              precision    recall  f1-score   support

       adult       1.00      1.00      1.00        26
       child       1.00      1.00      1.00        26

    accuracy                           1.00        52
   macro avg       1.00      1.00      1.00        52
weighted avg       1.00      1.00      1.00        52
_images/3a87d992da8ccceed47eef5ddcc94042e7e86fa364b448cbb80a32fabe6ed388.png

Exercise

You can intepret the confusion matrix accroding to this reference graph.

HOLY COW! Machine learning is amazing!!! Almost a perfect fit!

…which means there’s something wrong. What’s the problem here?

Fit the model with the training data and cross-validation#

Hide code cell source
from sklearn.model_selection import cross_val_predict, cross_val_score

# predict
y_pred = cross_val_predict(l_svc, X_train, y_train,
                           groups=y_train, cv=3)
# scores
acc = cross_val_score(l_svc, X_train, y_train,
                     groups=y_train, cv=3)

We can look at the accuracy of the predictions for each fold of the cross-validation

Hide code cell source
for i in range(len(acc)):
    print('Fold %s -- Acc = %s'%(i, acc[i]))
Hide code cell output
Fold 0 -- Acc = 0.8888888888888888
Fold 1 -- Acc = 1.0
Fold 2 -- Acc = 1.0

We can also look at the overall accuracy of the model

Hide code cell source
from sklearn.metrics import accuracy_score
overall_acc = accuracy_score(y_pred = y_pred, y_true = y_train)
overall_cr = classification_report(y_pred = y_pred, y_true = y_train)
overall_cm = confusion_matrix(y_pred = y_pred, y_true = y_train)
print('Accuracy:',overall_acc)
print(overall_cr)
Hide code cell output
Accuracy: 0.9615384615384616
              precision    recall  f1-score   support

       adult       1.00      0.92      0.96        26
       child       0.93      1.00      0.96        26

    accuracy                           0.96        52
   macro avg       0.96      0.96      0.96        52
weighted avg       0.96      0.96      0.96        52
Hide code cell source
thresh = overall_cm.max() / 2
cmdf = DataFrame(overall_cm, index = ['Adult','Child'], columns = ['Adult','Child'])
sns.heatmap(cmdf, cmap='copper')
plt.xlabel('Predicted')
plt.ylabel('Observed')
for i, j in itertools.product(range(overall_cm.shape[0]), range(overall_cm.shape[1])):
        plt.text(j+0.5, i+0.5, format(overall_cm[i, j], 'd'),
                 horizontalalignment="center",
                 color="white")
Hide code cell output
_images/060234f3f9aa441ca3960a764027b92a75e7d3602d013f1e3eb21edf540308be.png

The imporved model seems to be performing very well. Let’s run some null model:

Hide code cell source
from sklearn.model_selection import permutation_test_score
score, permutation_score, pvalue = permutation_test_score(
    l_svc, X_train, y_train, cv=3, scoring="accuracy",
    n_jobs=2, n_permutations=100)
print(f'accuracy {score}, average permutation accuracy {permutation_score.mean()}, p value {pvalue}')
Hide code cell output
accuracy 0.9629629629629629, average permutation accuracy 0.48880174291939005, p value 0.009900990099009901

so, as the classes are balanced, the chance level is close to 50%. The model performs significantly higher than chance.

Tweak your model#

It’s very important to learn when and where it’s appropriate to “tweak” your model.

Since we have done all of the previous analysis with our training data, it’s fine to try different models. But we absolutely cannot “test” it on our left-out-data. If we do, we are in great danger of overfitting.

We could try other models, or tweak hyperparameters, but we are probably not powered sufficiently to do so, and would once again risk overfitting.

But as a demonstration, we could see the impact of “scaling” our data. Certain machine learning algorithms perform better when all the input data is transformed to a uniform range of values. This is often between 0 and 1, or mean centered around with unit variance. We can perhaps look at the performance of the model after scaling the data.

# Scale the training data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
X_train_scl = scaler.transform(X_train)
plt.imshow(X_train, aspect='auto', interpolation='nearest')
plt.colorbar()
plt.title('Training Data')
plt.xlabel('features')
plt.ylabel('subjects')
Text(0, 0.5, 'subjects')
_images/6bd793a82325a05de02c0a0fa7dd5f5965bad8372248c9fa907765f5b91d795a.png
plt.imshow(X_train_scl, aspect='auto', interpolation='nearest')
plt.colorbar()
plt.title('Scaled Training Data')
plt.xlabel('features')
plt.ylabel('subjects')
Text(0, 0.5, 'subjects')
_images/b764c5a9967f55af07af13e6002e50dc70fb207c1c9a92448716dcfb6395af5e.png
# repeat the steps above to re-fit the model
# and assess its performance

# don't forget to switch X_train to X_train_scl

# predict
y_pred = cross_val_predict(l_svc, X_train_scl, y_train,
                           groups=y_train, cv=3)

# get scores
overall_acc = accuracy_score(y_pred = y_pred, y_true = y_train)
overall_cr = classification_report(y_pred = y_pred, y_true = y_train)
overall_cm = confusion_matrix(y_pred = y_pred, y_true = y_train)
print('Accuracy:',overall_acc)
print(overall_cr)

# plot
thresh = overall_cm.max() / 2
cmdf = DataFrame(overall_cm, index = ['Adult','Child'], columns = ['Adult','Child'])
sns.heatmap(cmdf, cmap='copper')
plt.xlabel('Predicted')
plt.ylabel('Observed')
for i, j in itertools.product(range(overall_cm.shape[0]), range(overall_cm.shape[1])):
        plt.text(j+0.5, i+0.5, format(overall_cm[i, j], 'd'),
                 horizontalalignment="center",
                 color="white")
Accuracy: 0.9615384615384616
              precision    recall  f1-score   support

       adult       1.00      0.92      0.96        26
       child       0.93      1.00      0.96        26

    accuracy                           0.96        52
   macro avg       0.96      0.96      0.96        52
weighted avg       0.96      0.96      0.96        52
_images/060234f3f9aa441ca3960a764027b92a75e7d3602d013f1e3eb21edf540308be.png

What do you think about the results of this model compared to the non-transformed model?

Exercise

Try fitting a new SVC model and tweak one of the many parameters. Run cross-validation and see how well it goes. Make a new cell and type SVC? to see the possible hyperparameters

#l_svc = SVC(kernel='linear') # define the model

Can our model classify children from adults in completely un-seen data?#

Now that we’ve fit a model that we think has possibly learned how to decode childhood vs adulthood based on resting state fMRI signal, let’s put it to the test. We will train our model on all the training data, and try to predict the age of the subjects we left out at the beginning of this section.

Because we performed a transformation on our training data, we will need to transform our testing data using the same information!

# Notice how we use the Scaler that was fit to X_train and apply it to X_test,
# rather than creating a new Scaler for X_test
X_test_scl = scaler.transform(X_test)

And now for the moment of truth!#

No cross-validation needed here. We simply fit the model with the training data and use it to predict the testing data

I’m so nervous. Let’s just do it all in one cell

l_svc.fit(X_train_scl, y_train) # fit to training data
y_pred = l_svc.predict(X_test_scl) # classify age class using testing data
acc = l_svc.score(X_test_scl, y_test) # get accuracy
cr = classification_report(y_pred=y_pred, y_true=y_test) # get prec., recall & f1
cm = confusion_matrix(y_pred=y_pred, y_true=y_test) # get confusion matrix

# print results
print('accuracy =', acc)
print(cr)

# plot results
thresh = cm.max() / 2
cmdf = DataFrame(cm, index = ['Adult','Child'], columns = ['Adult','Child'])
sns.heatmap(cmdf, cmap='RdBu_r')
plt.xlabel('Predicted')
plt.ylabel('Observed')
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j+0.5, i+0.5, format(cm[i, j], 'd'),
                 horizontalalignment="center",
                 color="white")
accuracy = 1.0
              precision    recall  f1-score   support

       adult       1.00      1.00      1.00         7
       child       1.00      1.00      1.00         7

    accuracy                           1.00        14
   macro avg       1.00      1.00      1.00        14
weighted avg       1.00      1.00      1.00        14
_images/9241207cb0ad4ea846a6591f9e1e21cc8c638709f790001faa50d64776da088a.png

The model generalized very well! We may have found something in this data which does seem to be systematically related to age … but what?

Interpreting model feature importances#

Interpreting the feature importances of a machine learning model is a real can of worms. This is an area of active research. Unfortunately, it’s hard to trust the feature importance of some models.

You can find a whole tutorial on this subject here: http://gael-varoquaux.info/interpreting_ml_tuto/index.html

For now, we’ll just eschew better judgement and take a look at our feature importances.

We can access the feature importances (weights) used by the model

l_svc.coef_
array([[ 0.00095925,  0.00230708, -0.00234671, ..., -0.00168313,
        -0.00091351,  0.0014638 ]])

Let’s plot these weights to see their distribution better

plt.bar(range(l_svc.coef_.shape[-1]),l_svc.coef_[0])
plt.title('feature importances')
plt.xlabel('feature')
plt.ylabel('weight')
Text(0, 0.5, 'weight')
_images/7ad31026a27514d5e088b4f6841289da01e958e25924c4bfbc216d828e2b7dd2.png

Or perhaps it will be easier to visualize this information as a matrix similar to the one we started with

We can use the correlation measure from before to perform an inverse transform

correlation_measure.inverse_transform(l_svc.coef_).shape
(1, 64, 64)
from nilearn import plotting

feat_exp_matrix = correlation_measure.inverse_transform(l_svc.coef_)[0]

plotting.plot_matrix(feat_exp_matrix, figure=(10, 8),  
                     labels=range(feat_exp_matrix.shape[0]),
                     reorder=False,
                    tri='lower')
<matplotlib.image.AxesImage at 0x7fe4a88d67c0>
_images/8c8891b893138864963339d8465748d91aeaa05e06a42f1e0909e8b91e53eff3.png

Let’s see if we can throw those features onto an actual brain.

First, we’ll need to gather the coordinates of each ROI of our atlas

coords = plotting.find_parcellation_cut_coords(atlas_filename)

And now we can use our feature matrix and the wonders of nilearn to create a connectome map where each node is an ROI, and each connection is weighted by the importance of the feature to the model

plotting.plot_connectome(feat_exp_matrix, coords, colorbar=True)
<nilearn.plotting.displays._projectors.OrthoProjector at 0x7fe4a85e9ee0>
_images/b445fdfa98c49fc593b84b05f59c13e71687a8b79509e7710d7083aed29796bd.png

Whoa!! That’s…a lot to process. Maybe let’s threshold the edges so that only the most important connections are visualized

plotting.plot_connectome(feat_exp_matrix, coords, colorbar=True, edge_threshold=0.005)
<nilearn.plotting.displays._projectors.OrthoProjector at 0x7fe492f4edf0>
_images/2716c15f26ff58b0b40e109640eefda4fc70d08e356024f01c959543c615ec0a.png

That’s definitely an improvement, but it’s still a bit hard to see what’s going on. Nilearn has a new feature that lets us view this data interactively!

plotting.view_connectome(feat_exp_matrix, coords, edge_threshold='90%')

You can choose to open the figure in a browser with the following lines:

# view = plotting.view_connectome(feat_exp_matrix, coords, edge_threshold='90%')
# view.open_in_browser()

Exercises#

  1. We walked through a lot of mistakes in this tutorial, try to start a fresh notebook and create a clean, and correct version of the workflow.

  2. The dataset contains the exact age of the subjects. Try to predict the age of the subjects instead of the age class. Hint - use regression instead of classification.

  3. Try other atlases - does a different atlas change the results?

  4. Advanced - try to implement different atlases as part of the parameter tuning.