Training

Adapted to the needs of your organisation, we provide training for the use of major analytical tools (R / Python / SQL databases + Big Data). We give this training at your site, with your infrastructure, using concrete examples from your business environment. Some of these courses are also given regularly at the University of Leuven (LStat), through RBelgium or the Data Science Innovation Hub and courses on the use of Oracle R Enterprise are given with our Oracle Partner.
The courses are unique in the world because of the breadth and depth. Courses range from the basics, to traditional data manipulation & visualisation, statistical learning & deploying your analytical solutions using R packages or by automation, indatabase routines or web services.
All offered courses can be found below or in full detail in the our brochure which you can download here. If you want to schedule one of these courses or if you are interested in a specific R course suited to your needs, let us know.
NEW, since 2020, you can now access courses Text Mining with R and Advanced R programming online through our online school, let us know if you want to obtain access.


 R basics
 R for starters
 Data visualisation / manipulation
 Common data manipulation & Programming in R
 Reporting with R
 Visualisation with R
 Data connectivity with R

Basic
Module: R for starters
Course Duration 2 days
Prerequisites None
Course Content
 What is R, packages available (CRAN, RForge, ...), R documentation search, finding help, RStudio editor, syntax
 Data types (numeric/character/factor/logicals/NA/Dates/Times)
 Data structures (vector/data.frame/matrix/lists and standard operations on these)
 Saving (RData) & importing data from flat files, csv, Excel, Oracle, MS SQL Server, SAS, SPSS
 Creating functions, data manipulation (subsetting, adding variables, ifelse, control flow, recoding, rbind, cbind) and aggregating and reshaping
 Plotting in R using base and lattice functionality (dot plots, barcharts, graphical parameters, legends, devices)
 Basic statistics in R (mean, variance, crosstabs, quantile, correlation, distributions, densities, histograms, boxplot, ttests, wilcoxon test, nonparametric tests)

Basic
Module: Common data manipulation & Programming in R
Course Duration 1 day
Prerequisites R experience from 1 month up to 2 years
Course Content
This module allows you to be a better programmer by writing your own functions, getting acquainted with commonly used R functions for basic data manipulation and the R object oriented programming environment. with, within, by, apply family of functions & splitapplycombine strategy
 vectorisation, parallel execution of code
 data.table  fast group by, joining and data.table programming tricks
 basic regular expressions
 writing your own functions
 do.call
 reshaping from wide to long format
 environments
 S3 classes, generics and basic S4 methodology
 handling of errors and exceptions, debugging code

Basic
Module: Reporting with R
Course Duration 1 day
Prerequisites R experience from 1 month up to 2 years
Course Content
If you want to create a report using R, either static, dynamic or an interactive web page, R provides several tools to do this. This module teaches you the basics of building reports with R. It covers the following topics. Sweave & knitr
 Markdown & pandoc
 integration with MS Office & presentations
 making R package vignettes
 An introduction to Shiny and interactive HTML reporting

Basic
Module: Visualisation with R
Course Duration 2 days
Prerequisites R experience from 1 month up to 2 years
Course Content
This course gives you an introduction to the 4 main graphic systems in R using standard daytoday graphics. It covers the following topics. R base graphics
 Graphical devices
 Lattice graphics
 ggplot2
 Interactive graphics using htmltools

Basic
Module: Data Connectivity with R
Course Duration 1 day
Prerequisites R experience from 1 month up to 2 years
Course Content
This course shows how to connect to different data sources (Excel, SQL databases, XML, JSON, Web scraping). It covers the following topics. Read/Write data from/to Excel
 Work efficiently with SQL databases
 dplyr and SQL databases
 Read in XML data
 JSON and YAML from R
 Web scraping from R and online data
 Open Data from Belgium

 Analytics with R and Python
 Statistical Machine Learning with R
 Text Mining with R
 Applied Spatial Analysis with R
 Computer Vision with R and Python
 Big Data
 Big Data Analytics with R (Spark + HAWQ + PL/R)

Analytics
Module: Statistical Machine Learning with R
Course Duration 2 days
PrerequisitesAt least knowledge of data manipulation with R. Knowledge of standard regressions models (lm/glm). Basic statistical knowledge.
Course Content
This course is a handson course covering the use of statistical machine learning methods available in R. The following basic learning methods will be covered and used on common datasets.
 naive bayes
 trees (recursive partitioning)
 feedforward neural networks
 penalized regression modelling (lasso/ridge/elasticnet regularized generalized linear models)
 bagging for classification and regression
 random forests
 adaboost & general boosting for classification & regression
 if time permits: graphical lasso / penalised generalized additive models / modelbasedrecursive partitioning or support vector machines
 model evaluation logic & hyperparameter tuning. Training and evaluation will be done through the use of the caret and ROCR packages

Analytics
Module: Text mining with R
Course Duration 2 days
PrerequisitesAt least knowledge of data manipulation with R. Knowledge of standard regressions models (lm/glm). Basic statistical knowledge.
Course Content
This course is a handson course covering the use of text mining tools for the purpose of data analysis. It covers basic text handling, natural language engineering and statistical modelling on top of textual data. The following items are covered. Text encodings
 Cleaning of text data, regular expressions
 String distances
 Graphical displays of text data
 Natural language processing: stemming, partsofspeech tagging, tokenization, lemmatisation
 Sentiment analysis
 Statistical topic detection modelling and visualization (latent diriclet allocation)
 Visualisation of correlations & topics
 Word embeddings
 Document similarities & Text alignment

Analytics
Module: Applied Spatial modelling with R
Course Duration 2 days
PrerequisitesAt least knowledge of data manipulation with R, S3/S4 classes and standard R visualisation. Knowledge of standard regressions models (lm/glm). Basic statistical knowledge.
Course Content
This course is useful for data scientists and data analysts which work frequently with data with a spatial component (data with latitude/longitude information). It gives an introduction to the numerous spatial facilities of R and some standard spatial statistical models. The following items are covered during the course. Importing spatial data and setting the spatial projection
 Plotting spatial data on static and interactive maps
 Adding graphical components to spatial maps
 Manipulation of geospatial data, geocoding, distances, ...
 Density estimatio and spatial point pattern analysis
 Spatial regression
 Kriging and spatial predictions

Analytics
Module: Computer Vision with R and Python
Course Duration 2 days
PrerequisitesAt least knowledge of data manipulation with R and standard R visualisation. Some basic Python knowledge. Basic statistical knowledge.
Course Content
This course is a handson course covering the use of image analysis. It covers basic image manipulation, feature engineering techniques and finding patterns in images. The following items are covered. image manipulation & adjustments
 finding blobs, corners, gradients, edges & lines
 optical character recognition
 feature & object detection
 applying filters
 deep learning for image analysis
 image segmentation

Analytics
Module: Big data analytics with R on top of Spark, Hadoop and HAWQ
Course Duration 3 days
PrerequisitesAt least knowledge of data manipulation with R, S3 classes. Knowledge of Statistical Machine Learning with R.
Course Content
This course is useful for data scientists and data analysts which work frequently on big datasets in Spark/Hadoop. The following items are covered during the course. Overview of the big data ecosystem for data scientists
 Linux system commands for data scientists
 Work with the hadoop file system (read/write, directories). Typical Hadoop files
 MapReduce & mapply
 Spark & R: SparkSQL using package sparklyr & dplyr. Spark Machine Learning using package sparklyr (data preparation, regression, randomforest & boosted trees) + spark extensions
 HAWQ & R: Running PL/R stored procedures using Apache HAWQ. Classification & regression using MadLib & PivotalR


Advanced / Deploy
Module: Creating R packages and R repositories
Course Duration 1 day
PrerequisitesAt least knowledge of data manipulation with R, S3/S4 classes.
Course Content
R is notorious for it’s flexibility by having more than 6000 packages available for direct usage. If you want to create your own package to distribute code to others inside your organization, this module teaches you how to build your own package and set up an enterprise R package repository. structure of an R package
 documenting your code and your R package using roxygen in RStudio
 check, build, install your R package
 unittesting your R code
 creating your own R package repository

Advanced / Deploy
Module: Managing R processes
Course Duration 1 day
PrerequisitesAt least knowledge of data manipulation with R. Not afraid of the shell.
Course Content
R is a programming language and can be launched from the command line. This module will learn you how to launch R properly in order to automate processes R and Rscript and the options
 Startup scripts and settings
 Handling of command line arguments
 Understanding package library folders
 Automating R processes in Windows & Linux
 Handling and logging of error/warning messages

Advanced / Deploy
Module: R code management, Git and Continuous Integration
Course Duration 1 day
PrerequisitesAt least knowledge of data manipulation with R. Not afraid of the shell. Knowledge of Sweave/Knitr. Knowledge of package building. Knowledge of R process automation.
Course Content
If you are an R developer and want to make sure your analysis is reproducible and traceable, you need to learn how to use a code repository. In this session, you will learn how to use RStudio for code maintenance. The following items are covered: R Coding guidelines
 Good practices in management & structuring of R code
 Git & code repositories
 Setup
 Push, pulling and cloning code
 Handling of conflicts
 Setting branches and releases
 Use of RStudio’s IDE to integrate with your code repository
 R package repositories: Build a local CRAN repository with your own packages
 Doing continuous integration with Gitlab and Travis

Advanced / Deploy
Module: Integration of R into web applications
Course Duration 2 days
PrerequisitesAt least knowledge of data manipulation with R. Not afraid of the shell. Knowledge of Sweave/Knitr. Knowledge of package building. Knowledge of R process automation and R package building.
Course Content
Learn how to create basic web applications and web services in R. The following elements are covered in the course: Setting up Shiny applications
 RApache & OpenCPU
 using R alongside javascript, htmlwidgets
 setting up webservices using R

Advanced / Deploy
Module: Shiny
Course Content
 This course can be supplied directly by RStudio. We don't provide a homemade course for Shiny.


Oracle has built "Oracle R Enterprise" as part of the Oracle Advanced Analytics Option to make the OpenSource statistical programming language R ready for the Enterprise and Big Data. Designed for problems involving large amounts of data, Oracle R Enterprise integrates R with Oracle Database. R users can run R commands and scripts for statistical and graphical analyses on data stored in the Oracle Database.

Oracle R
Module: ROracle and Oracle R Enterprise  transparancy layer
Course Duration 1 day
PrerequisitesAt least knowledge of data manipulation with R, S3 classes.
Course Content
In this session you will learn how to interface R with Oracle and use the transparency layer provided by ORE. You will be given access to an Oracle database where ORE is installed so you can use the ORE suite during the exercises. The course covers the following topics. ROracle  getting and sending SQL queries from Oracle
 Installing Oracle R Enterprise (ORE)
 Basic database connectivity: ore.exec, ore.ls, ore.synch, ore.push, ore.pull, ore.create, ore.drop, ore.get
 ORE data types: ore.character, ore.factor, ore.logical, ore.number, ore.datetime, ore.numeric. Conversion between data types
 ORE data structures: ore.matrix, ore.frame, ore.vector
 ORE transparancy data operations on ore.frame/ore.vector (subset, ncol, nrow, head, ifelse, paste, is.na, sd, mean, tapply, by, c, %in%, ...) and indexing and overwriting indatabase ore.vectors
 Save R objects in Oracle ore.save, ore.load, ore.datastore and ORE data store handling
 Basic statistics with ORE (ore.univariate, ore.summary, ore.crosstab, ore.corr, exponential smoothing, t.test, wilcoxon, IQR)

Oracle R
Module: Oracle R Enterprise  advanced data manipulation
Course Duration 0.5 days
PrerequisitesAt least knowledge of data manipulation with R, S3 classes. Understanding Oracle R Enterprise and the transparancy layer.
Course Content
You will be given access to an Oracle database where ORE is installed so you can use the ORE suite during the exercises. The session covers the following topics: Running R functions parallel inside the database: ore.doEval, ore.groupApply, ore.indexApply, ore.rowApply, ore.tableApply
 Creating R scripts inside the database and accessing ORE stored procedures
 Embedding R scripts in production database applications
 Embedded (parallel) R execution within ORE using the R Interface as well as the SQL Interface

Oracle R
Module: Data mining models inside Oracle R Enterprise and Oracle Data Mining
Course Duration 1 day
PrerequisitesAt least knowledge of data manipulation with R, S3 classes. Understanding Oracle R Enterprise and the transparancy layer. Knowledge of statistical modelling and machine learning.
Course Content
Data Mining (aka Machine Learning) refers to a set of statistical and mathematical techniques to reveal relationships and patterns in data. In contrast to ‘classical’ statistical methods, there is no need for hypothesis in advance. Applications of Data Mining Methods are forecastmodels, market basket analysis, target group analysis and more. You will be given access to an Oracle database where ORE is installed so you can use the ORE suite during the exercises. Mark that if you are unfamiliar with data science algorithms, it is advised to also follow the module on ‘Statistical machine learning with R’ In this session you will become acquainted with some of the most common data mining methods and learn how to use these algorithms in ORE. The following algorithms will be covered. principal component analysis and factor analysis
 kmeans clustering and orthogonal partitioning
 data reduction using Minimum Description Length attribute importance
 linear models and generalized linear models
 naive bayes, neural networks, decision tree and support vector machines
 market basket analysis / recommendation engines (apriori)
 bagging



Upcoming public courses.
 20191017&18: Statistical Machine Learning with R: Subscribe here
 20191114&15: Text Mining with R: Subscribe here
 20191217&18: Applied Spatial Modelling with R: Subscribe here
 20200219&20: Advanced R programming: Subscribe here
 20200312&13: Computer Vision with R and Python: Subscribe here
 20200316&17: Deep Learning/Image recognition: Subscribe here
 20200422&23: Text Mining with R: Subscribe here
 20200506&07: Text Mining with Python: Subscribe here
Online courses.
The following courses are available online anytime: Text Mining with R
 Advanced R programming
Private courses.
If you want to schedule a private course or if you are interested in a specific course suited to your needs, let us know by filling out the email form.