Open Data in Belgium - release of BelgiumStatistics R package
On 22/10/2015, the Belgium government launched its Open Data initiative by releasing a number of datasets related to population statistics, fiscal information, 'kadaster', the 2011 census and some tools. Because BNOSAC works a lot with these kind of data and because we like to promote open data, an R package called BelgiumStatistics was made available for R users at https://github.com/jwijffels/BelgiumStatistics
The package contains all the datasets released by Statistics Belgium (Bevolking, Werk, Leefmilieu, Census 2011) under the 'Licentie open data'. Readily available to R users. Thanks to the open data, analysing and visualising Belgium data has now become a lot smoother as the example below shows.
require(BelgiumStatistics) require(data.table) require(BelgiumMaps) require(leaflet) data(TF_SOC_POP_STRUCT_2015) ## Part of BelgiumStatistics data(mapbelgium.fusiegemeenten.wgs) ## Part of BelgiumMaps (not released yet) x <- as.data.table(TF_SOC_POP_STRUCT_2015) x <- x[, list(MS_POPULATION = sum(MS_POPULATION), Foreigners = sum(MS_POPULATION[TX_NATLTY_NL == "Vreemdelingen"]) / sum(MS_POPULATION), Age = 100 * sum(MS_POPULATION * CD_AGE) / sum(MS_POPULATION), Females = 100 * sum(MS_POPULATION[CD_SEX == "F"]) / sum(MS_POPULATION)), by = list(CD_MUNTY_REFNIS, TX_MUNTY_DESCR_NL)] x <- setDF(x) mymap <- merge(mapbelgium.fusiegemeenten.wgs, x, by.x = "ORDER08", by.y = "CD_MUNTY_REFNIS", all.x=TRUE, all.y=FALSE) mymap <- subset(mymap, !is.na(Foreigners)) pal <- colorNumeric(palette = "Blues", domain = mymap$Foreigners) leaflet(mymap) %>% addTiles() %>% addPolygons(stroke = FALSE, smoothFactor = 0.2, fillOpacity = 0.85, color = ~pal(Foreigners))
Text Mining with R
Last week, we had a great course on Text Mining with R at the European Data Innovation Hub. For persons interested in text mining with R, another 1-day crash course is scheduled at the Leuven Statistics Research Center (Belgium) on November 17 (http://lstat.kuleuven.be/training/coursedescriptions/text-mining-with-r). The following elements are covered in the course.
1. Import of (structured) text data with focus on text encodings. Detection of language
2. Cleaning of text data, regular expressions
3. String distances
4. Graphical displays of text data
5. Natural language processing: stemming, parts-of-speech (POS) tagging, tokenization, lemmatisation, entity recognition
6. Sentiment analysis
7. Statistical topic detection modelling and visualisation (latent dirichlet allocation)
8. Automatic classification using predictive modelling based on text data
More information on the course & the registration: http://lstat.kuleuven.be/training/coursedescriptions/text-mining-with-r
If you are interested in applying Text Mining techniques on your data, get in touch: bnosac.be/index.php/contact/mail-us
R courses on basic R, advanced R, statistical machine learning with R, text mining with R, spatial modelling with R and R package building
Waw, our course list for teaching R is getting bigger and bigger. We have now courses on basic, R, advanced R, R package building, statistical machine learning with R, text mining with R and spatial analysis with R. All face-to-face courses given in Belgium and scheduled in the coming months.
Some courses are given at the European Data Innovation Hub (Brussels, Belgium), other courses are given through the Leuven Statistics Research Center (Leuven, Belgium). From today on, you can register for the following courses regarding the use of R.
Prices are set to 300€ per course day + taxes.
For detailed information on the course content, have a look at the pdf which can be found here.
Courses given at the European Data Innovation Hub (Brussels, Belgium) - http://www.datainnovationhub.
07-08/09/2015 Introduction to R programming (2 days) (register at www.eventbrite.com/e/common-
14/09/2015 Common data manipulation for R programmers (1 day) (register at www.eventbrite.com/e/common-
28-29/09/2016 Statistical Machine Learning with R (2 days) (register at www.eventbrite.com/e/
05/10/2015 Text mining with R (1 day) (register at www.eventbrite.com/e/text-
02/11/2015 Reporting with R (1 day) (register at www.eventbrite.com/e/
03/11/2015 Creating R packages and R repositories (1 day) (register at www.eventbrite.com/e/training-
Courses given at LStat (Leuven, Belgium) - http://lstat.kuleuven.be/
28-29/10/2015 Statistical Machine Learning with R (2 days) (register at lstat.kuleuven.be/training/
17/11/2015 Text mining with R (1 day) (register at lstat.kuleuven.be/training/
17-18/02/2016 Advanced R programming topics (2 days) (register at lstat.kuleuven.be/training/
13-14/04/2016 Applied Spatial Modelling with R (1.5 days) (register at lstat.kuleuven.be/training/
Hope to see you soon.
PS. Thanks Brandon for allowing us to use your wonderfull logo
R courses
For persons interested in advancing their knowledge on R and data science with R, BNOSAC offers a range of courses for R users.
These cover courses on
- R analytics (Statistical Machine Learning with R, Text mining with R, Applied Spatial modelling with R)
- R programming (R for starters, Common data manipulation for R programmers, Reporting with R, Creating R packages and R repositories, Managing R processes, Using SVN/git with RStudio, Data connectivity using R, Integration of R into web applications)
- Oracle R Enterprise
For more information, go to http://bnosac.be/images/activities/bnosac_courses_r.pdf
UseRs interested in one of these courses can follow one of the scheduled courses listed below or request for hands-on training sessions at your site. The scheduled courses are given in Belgium.
***********************************************************************************
Statistical courses in cooperation with LStat:
- Statistical Machine Learning with R: 28/10/2015 & 29/10/2015
- Text mining with R: 17/11/2015 (new)
- Applied Spatial statistics with R: 13/04/2016 & 14/04/2016 (new)
Programming courses in cooperation with LStat:
- Advanced R programming topics: 17/02/2016 & 18/02/2016
-> course containing common data manipulation for R programmers, reporting with R, creating R packages and repositories
1-week bootcamp on using R with Oracle R Enterprise, in cooperation with Tripwire Solutions. This includes:
- R for starters: 08-12/06/2015
- Common data manipulation for R programmers: 08-12/06/2015
- ROracle and Oracle R Enterprise (ORE) - transparancy layer: 08-12/06/2015
- Oracle R Enterprise - advanced data manipulation: 08-12/06/2015
- Data mining models inside Oracle R Enterprise (ORE) and Oracle Data Mining (ODM): 08-12/06/2015
***********************************************************************************
Make sure you register early for the Statistical courses offered jointly with LStat.
For the previous years the room on machine learning and advanced r programming was overbooked.
If there is a lot of interest in certain courses, an extra session can be set up.
Interested?, get into contact.
Course on using Oracle R Enterprise
BNOSAC will be giving from June 08 up to June 12 a 5-day crash course on the use of R using Oracle R Enterprise. The course is given together with our Oracle Partner in Leuven, Belgium. If you are interested in attending, contact us for further details.
For R users who aren't aware of this yet. Oracle has embedded R into it's database which allows R users to transparently run R code inside the database - yes really transparently. The Oracle R Enterprise is part of the Oracle Advanced Analytics stack which basically consists of the following elements for R users:
- ROracle: supported native DBI driver to connect R with Oracle which is open source and available at CRAN (link).
- Oracle R Enterprise (ORE): This consists of an Oracle released version of R which is up to date with the current version of R and supported by Oracle and next to that a number of R packages which are available for download at the ORE website. These packages embed R into Oracle.
- Oracle Data Mining (ODM): a set of distributed data mining algorithms accessible from R
- Oracle Advanced Analytics for Hadoop (ORAAH) : a set of R packages which allow R users to connect with Hadoop and run data mining models and map reduce jobs on top of Hadoop. (link)
During the 5-day course, you will learn how to use R alongside the Oracle DB. The course covers some base R, advanced R usage and functionality from the Oracle R Enterprise (ORE) suite of packages.
Module 1: Introduction to base R
What is R, packages available (CRAN, R-Forge, ...), R documentation search, finding help, RStudio editor, syntax, Data types (numeric/character/factor/logicals/NA/Dates/Times), Data structures (vector/data.frame/matrix/lists and standard operations on these), Saving (RData) & importing data from flat files, csv, Excel, Oracle, MS SQL Server, SAS, SPSS, Creating functions, data manipulation (subsetting, adding variables, ifelse, control flow, recoding, rbind, cbind) and aggregating and reshaping, Plotting in R using base and lattice functionality (dot plots, barcharts, graphical parameters, legends, devices), Basic statistics in R (mean, variance, crosstabs, quantile, correlation, distributions, densities, histograms, boxplot, t-tests, wilcoxon test, non-parametric tests)
Module 2: Advanced R programming & data manipulation with base R
vectorisation, writing your own functions, control flow, aggregating and data.table - fast group by, joining and data.table programming tricks, reshaping from wide to long format, miscellaneous usefull functions, apply family of functions & split-apply-combine strategy, do.call, parallel execution of code, handling of errors and exceptions, debugging code, other goodies: basic regular expressions, data manipulations, rolling data handling, S3 classes, generics and basic S4 methodology
Module 3: ROracle and Oracle R Enterprise (ORE) - transparancy layer
• ROracle - getting and sending SQL queries from Oracle
• Installing Oracle R Enterprise (ORE)
• Basic database connectivity: ore.exec, ore.ls, ore.synch, ore.push, ore.pull, ore.create, ore.drop, ore.get
• ORE data types: ore.character, ore.factor, ore.logical, ore.number, ore.datetime, ore.numeric. Conversion between data types
• ORE data structures: ore.matrix, ore.frame, ore.vector
• ORE transparancy data operations on ore.frame/ore.vector (subset, ncol, nrow, head, ifelse, paste, is.na, sd, mean, tapply, by, c, %in%, ...) and indexing and overwriting in-database ore.vectors
• Save R objects in Oracle ore.save, ore.load, ore.datastore and ORE data store handling
• Basic statistics with ORE (ore.univariate, ore.summary, ore.crosstab, ore.corr, exponential smoothing, t.test, wilcoxon, IQR)
Module 4: Oracle R Enterprise - advanced data manipulation
• Running R functions parallel inside the database: ore.doEval, ore.groupApply, ore.indexApply, ore.rowApply, ore.tableApply
• Creating R scripts inside the database and accessing ORE stored procedures
• Embedding R scripts in production database applications
• Embedded (parallel) R execution within ORE using the R Interface as well as the SQL Interface
Module 5: Data mining models inside Oracle R Enterprise (ORE) and Oracle Data Mining (ODM)
In this session you will become acquainted with some of the most common data mining methods and learn how to use these algorithms in ORE. The following algorithms will be covered.
• principal component analysis and factor analysis
• kmeans clustering and orthogonal partitioning
• data reduction using Minimum Description Length attribute importance
• linear models and generalized linear models
• naive bayes, neural networks, decision tree and support vector machines
• market basket analysis / recommendation engines (apriori)
• bagging
If you are interested in attending, contact us for further details.
Meer artikelen...
- streaming machine learning with RMOA: stream_in > train > predict
- Visualisation with R and Google Maps
- Using R in Robotics applications with ROS
- R & Google Maps and R & Robotics (ROS)
- Host a CRAN mirror using Docker
- Machine learning with R & Advanced R programming course
- RMOA package for running streaming classifcation & regression models now at CRAN
- RMOA: Massive online data stream classifications with R & MOA
- Air quality, weather analysis & latent feature modelling @RBelgium
- advanced r programming topics + rapache course
- Connect R with Myrrix - Mahout & Cloudera's real-time, scalable recommender system
- Popularity bigdata / large data packages in R and ffbase useR presentation
- Massive online data stream mining with R
- bigglm on your big data set in open source R, it just works - similar as in SAS
- RBelgium meeting on November, 16