Vai al contenuto principale





Anno accademico 2019/2020

Codice dell'attività didattica
Prof. Michele Caselle (Titolare del corso)
Corso di studi
Laurea Magistrale Interateneo in Fisica dei sistemi complessi
1° anno 2° anno
Periodo didattico
Terzo periodo didattico
D=A scelta dello studente
SSD dell'attività didattica
FIS/02 - fisica teorica, modelli e metodi matematici
Modalità di erogazione
Lingua di insegnamento
Modalità di frequenza
Tipologia d'esame
Scritto ed orale

Sommario insegnamento


Obiettivi formativi

Obiettivo del corso è fornire una panoramica dei più avanzati strumenti di analisi dati. In particolare:

-  Discutere alcune applicazioni esemplari

- Discutere i progressi più recenti nel campo dell'inferenza statistica 

Provide an overview of the most recent tools in the context of Data Analysis
- Discuss a few topical applications both in the context of Human and of Natural Sciences
- Introduce a few of the most recent computational tools for data analysis and inference



Risultati dell'apprendimento attesi

Alla fine del corso lo studente avrà una conoscenza approfondita dei più moderni risultati nel campo della Analisi Dati e dell'inferenza statistica.


At the end of course the student will reach a good knowledge of the most advanced results in Data Analysis and Statistical Inference


Modalità di insegnamento

Lezioni interattive con analisi hands-on in classe

Interactive lessons with hands-on in-class analysis of data sets


Modalità di verifica dell'apprendimento

La valutazione del corso si baserà su un progetto finale e su  una prova orale, nella quale viene chiesto di affrontare ab initio due o tre argomenti svolti a lezione. In caso di non superamento dell'esame la ripetizione dello stesso deve avvenire almeno due settimane dopo la prima prova.

 The course evaluation will be based on a final  project  and an oral examination, during which the student has to develop ab initio two or three  of the topics explained during the lectures. In case the exam fails, it cannot be repeated earlier than two weeks after the first attempt.





  Il Corso è diviso in due parti. La prima, tenuta da L. Martignetti e la seconda da M. Chertkov

First part:

    1. Introduction, big data, examples, job orientation

    2. Data acquisition, experimental design, cleaning, pre-processing - types of data: binary, category, discrete, continous - sequence data analysis: sequence entropy, motif discovery, repeats - Normalization - Outliers - Batch effects - Missing values (imputation, not at random missing values)

    3. Exploratory data analysis - descriptive statistics - data distribution: gaussian, Poisson, binomial - two variables: regression - correlation - many variables - Dim reduction: PCA - outliers - clustering - NMF - ICA

    4. Unsupervised/Supervised inference - statistical tests (parametric, non parametric)

    5. Networks - casual networks - bayesian networks - similarity networks - Network component analysis - bipartite networks - modularity - community detection - what to do with these networks (graph analysis) => minimal cut sets => hubs, connectivity....

    6. From networks to a formal models - visualisation of data - use of data for construction of networks / models (ROMA, GSEA) - mathematical modelling (method and motivation) - different types of formalisms - ode - logical modelling - applications - pipeline of modelling applications - data to models - precision medicine - systems pharmacology

    7. Multi-level Data integration - multiplex - SNFIntroduction to Data Gathering and Analysis


Second part:

1) Graphical Models (Language) and Structured Statistical Inference and Learning (problem formulations) in Computer Science, Information Theory and Physics (intro).

2) Computational Complexity & Algorithms (Deterministic & Stochastic).

3) Statistical Inference as an Optimization -- from Partition Function and Marginal Probabilities to Free Energy (Kublack-Leibler Functional).

4) Mean-Field, Belief Propagation, Linear Programming -- Variational Approaches, Relaxations, Lower and Upper Bounds. Exact & Heuristic approaches. Iterative Algorithms.

5) Modern Analysis and Algorithmic Tools. Review of Loop Series, Cummulant Expansions, Computational Trees & Graph Cover Approaches.

6) Sample and Algorithmic Complexity of Learning. Graphical Model Learning. From Graphical Models to Neural Networks and Deep Learning.



 The course is composed by two set of lectures.  The first is given by  L. Martignetti and the second by M. Chertkov

First part:

    1. Introduction, big data, examples, job orientation

    2. Data acquisition, experimental design, cleaning, pre-processing - types of data: binary, category, discrete, continous - sequence data analysis: sequence entropy, motif discovery, repeats - Normalization - Outliers - Batch effects - Missing values (imputation, not at random missing values)

    3. Exploratory data analysis - descriptive statistics - data distribution: gaussian, Poisson, binomial - two variables: regression - correlation - many variables - Dim reduction: PCA - outliers - clustering - NMF - ICA

    4. Unsupervised/Supervised inference - statistical tests (parametric, non parametric)

    5. Networks - casual networks - bayesian networks - similarity networks - Network component analysis - bipartite networks - modularity - community detection - what to do with these networks (graph analysis) => minimal cut sets => hubs, connectivity....

    6. From networks to a formal models - visualisation of data - use of data for construction of networks / models (ROMA, GSEA) - mathematical modelling (method and motivation) - different types of formalisms - ode - logical modelling - applications - pipeline of modelling applications - data to models - precision medicine - systems pharmacology

    7. Multi-level Data integration - multiplex - SNFIntroduction to Data Gathering and Analysis

Second part:

1) Graphical Models (Language) and Structured Statistical Inference and Learning (problem formulations) in Computer Science, Information Theory and Physics (intro).

2) Computational Complexity & Algorithms (Deterministic & Stochastic).

3) Statistical Inference as an Optimization -- from Partition Function and Marginal Probabilities to Free Energy (Kublack-Leibler Functional).

4) Mean-Field, Belief Propagation, Linear Programming -- Variational Approaches, Relaxations, Lower and Upper Bounds. Exact & Heuristic approaches. Iterative Algorithms.

5) Modern Analysis and Algorithmic Tools. Review of Loop Series, Cummulant Expansions, Computational Trees & Graph Cover Approaches.

6) Sample and Algorithmic Complexity of Learning. Graphical Model Learning. From Graphical Models to Neural Networks and Deep Learning.



Testi consigliati e bibliografia


Dispense e riferimenti forniti durante il corso.



Lecture notes and references provided therein.


Orario lezioni

Lezioni: dal 15/04/2019 al 14/06/2019

Nota: Il corso si svolge in aula informatica B (tranne che nei giorni indicati)

Orario del prof. Martignetti:

1a settimana:
Lun 15 Aprile (3h) 14-17
Mar 16 Aprile (3h) 14-17
Mer 17 Aprile (2+2) 11-13 / 14-17

2a settimana:
Pausa lezioni

3a settimana:
Lun 29 Aprile (2+2) 9-11 / 14-16
Mar 30 Aprile (2+2) 11-13 / 14-16 AULA D
Gio 2 Maggio (2+2) 11-13 / 16-18
Ven 3 Maggio (2+2) 11-13 / 14-16

4a settimana:
Lun 6 Maggio (3h) 14-17
Mar 7 Maggio (3h) 14-17 AULA D
Mer 8 Maggio (2+2) 11-13 / 14-16

Orario prof. Chertkov

5 giugno ore 14:30-16:30 aula Magna
6-10-11-13-14 giugno ore 11-13 aula info B



Il corso sara' tenuto dai Prof Loredana Martignetti e dal Prof. Michael Chertkov

Ultimo aggiornamento: 02/09/2019 14:07
Non cliccare qui!