Biomolecular data mining: a hands-on training (2017)

The life science domain is undergoing a drastic revolution as more scientists and doctors are confronted with high- throughput multivariate data from so-called ‘omics techniques. We are pleased to announce a training initiative for life scientists, which aims to introduce the concepts as well as hands-on skills to the data mining and machine learning techniques that are able to analyse this type of biomedical data. 

Practical info

What? a 3 days, hands-on data mining course for life scientists

When? 11 - 13 September 2017, 9:30 - 17:00h

Where? Campus Groenenborger, Room Z.522 (or Z.233 for coffee breaks), University of Antwerp, Antwerp, Belgium 

Target Audience:  doctoral students and postdoctoral researchers in life sciences

Number of places:  Limited to 30. For half of the places, priority is given to UAntwerpen PhD students. 

Registration fee:  120 Euro (covers lunch and coffe breaks). For PhD students of the University of Antwerp, this fee is waived thanks to the sponsorship of the Antwerp Doctoral School. 

How to register?  The course is fully booked. Registration is now closed. 

Organisers: Contact Prof. dr. Kris Laukens or Dr. Pieter Meysman if you have questions.  

Supported by:



More detailed information:

Summary: This course aims to introduce the core principles and biomedical applications of different data mining and machine learning techniques in a hands-on manner. It will tackle both unsupervised (clustering, frequent pattern mining, data projection) and supervised (classification) techniques. The methods that will be seen include hierarchical clustering, k- means clustering, item set mining, association rule mining, principle component analysis, support vector machines, random forests, bayesian networks and artificial neural networks. Attendees will be introduced to the basic operations of these data mining techniques, with a focus on the practical use and interpretation of these procedures rather than the mathematical formulas. In addition, students will be introduced to some important data processing and performance evaluation methods related to these data mining techniques. The software used in this course will be R. The course itself will consist of 50% theory lessons, 40% hands-on practicals and 10% application case studies. 

Prerequisites: Basic knowledge of statistical principles is recommended. E.g. a Bachelor or Master’s course on statistics is sufficient. Prior knowledge of omics preprocessing is useful, but the most essential concepts will be refreshed during the course. Prior programming experience is not required, as an optional introduction to R will be foreseen for the first morning of the course. 

Course outline:

(subject to change)

THEORY (distributed over day 1-3, ca 4 hours per day)

1. introduction (day 1 morning)

  • context, definitions, challenges
  • data types: quantitative data (proteome, metabolome, epigenome, mRNA abundances), string data, text, graph data
  • unsupervised versus supervised learning: key concepts

2. data preprocessing principles and techniques (day 1  morning + afternoon)

  • introduction, feature selection
  • preprocessing genomics and transcriptomics data / proteomics and metabolomics data / other biomedical data types

3. analysis of complex data with univariate techniques and visualisations (day 1 afternoon)

  • univariate techniques and multiple testing corrections
  • complex data visualisation

4. unsupervised data mining (day 2)

  • principles of unsupervised clustering
  • unsupervised PCA
  • unsupervised frequent pattern mining

5. supervised machine learning (day 3)

  • principles (cross-validation, ...)
  • LDA, nearest neighbour models, regressie
  • support vector machines
  • decision trees and random forests
  • Bayesian models
  • neural networks and deep learning

6. other relevant techniques (day 2 afternoon)

  • evolutionary algorithms
  • text mining

APPLICATION CASE STUDIES (distributed over day 1-3, ca 45min. per day)

In a dedicated lectures interwoven with the theory, bioinformatics and biomedical informatics researchers show through real research results how these techniques can be employed to extract novel insights from biomedical data. These lectures will cover diverse data types (e.g. quantitative molecular data, molecular sequences, molecular interactions, ontologies, text, physiological measurements, patient meta-data, …) and several of the techniques addressed above. The case studies will be presented by domain experts from the University of Antwerp already using data mining in day-to-day scientific research from different life science disciplines. 

Possible topics:

  • reverse engineering of regulatory molecular networks
  • biomarker discovery from genome, transcriptome, proteome and metabolome data
  • finding genetic defects through next generation DNA sequence analysis
  • pattern finding in NMR spectrometry and mass spectrometry data

PRACTICE (distributed over day 1-3, ca 3 hours per day)

In the practical sessions, the participants will use the techniques discussed in the course on real biomolecular data. The language used will be R ( Participants without basic programming skills in R will receive an introduction at the beginning of this training (which can be skipped by proficient R users).