- Published on Monday, 29 September 2014 08:45
This year, BNOSAC offers 2 R courses in cooperation with the Leuven Statistics Research Center. The courses are part of the Leuven STATistics STATe of the Art Training Initiative and are given in Leuven (Belgium).
For R users and data scientists we offer a 2 short courses on R programming & statistical learning. Namely
You can download the brochure with the courses here: http://lstat.kuleuven.be/training/HR_BRO_LSTAT_2014-2015.pdf
Interested? Registration can be done here.
- Published on Thursday, 25 September 2014 11:47
Last week, we released the RMOA package at CRAN (http://cran.r-project.org/web/packages/RMOA). It is an R package to allow building streaming classification and regression models on top of MOA.
MOA is the acronym of 'Massive Online Analysis' and it is the most popular open source framework for data stream mining which is being developed at the University of Waikato: http://moa.cms.waikato.ac.nz. Our RMOA package interfaces with MOA version 2014.04 and focusses on building, evaluating and scoring streaming classification & regression models on data streams.
Classification & regression models which are possible through RMOA are:
- Classification trees:
- Bayesian classification:
- Active learning classification:
- Ensemble (meta) classifiers:
- Regression modelling:
* SGD (Stochastic Gradient Descent)
Interfaces are implemented to model data in standard files (csv, txt, delimited), ffdf data (from the ff package), data.frames and matrices.
Documentation of MOA directed towards RMOA users can be found at http://jwijffels.github.io/RMOA
Examples on the use of RMOA can be found in the documentation, on github at https://github.com/jwijffels/RMOA or e.g. by viewing the showcase at http://bnosac.be/index.php?option=com_content&view=article&id=32:rmoa-massive-online-data-stream-classifications-with-r-a-moa&catid=8:blog&Itemid=107
If you need support on building streaming models on top of your large dataset. Get into contact.
- Published on Sunday, 18 May 2014 21:30
- It uses a limited amount of memory. So this means no RAM issues when building models.
- Processes one example at a time, and will run over it only once
- Works incrementally - so that a model is directly ready to be used for prediction purposes
- Easy to set up data streams on data in RAM (data.frame/matrix), data in files (csv, delimited, flat table) as well as out-of memory data in an ffdf (ff package).
- Easy to set up a MOA classification model
There are 26 classification models available which range from
- Classification Trees (AdaHoeffdingOptionTree, ASHoeffdingTree, DecisionStump, HoeffdingAdaptiveTree, HoeffdingOptionTree, HoeffdingTree, LimAttHoeffdingTree, RandomHoeffdingTree)
- Bayes Rule (NaiveBayes, NaiveBayesMultinomial)
- Bagging (LeveragingBag, OzaBag, OzaBagAdwin, OzaBagASHT)
- Boosting (OCBoost, OzaBoost, OzaBoostAdwin)
- Stacking (LimAttClassifier)
- Other (AccuracyUpdatedEnsemble, AccuracyWeightedEnsemble, ADACC, DACC, OnlineAccuracyUpdatedEnsemble, TemporallyAugmentedClassifier, WeightedMajorityAlgorithm)
- Active learning (ActiveClassifier)
Easy R-familiar interface to train the model on streaming data with a familiar formula interface as in
trainMOA(model, formula, data, subset, na.action = na.exclude, ...)
Easy to predict new data alongside the model as in
predict(object, newdata, type = "response", ...)
## ## Installation from github ## library(devtools) install.packages("ff") install.packages("rJava") install_github("jwijffels/RMOA", subdir="RMOAjars/pkg") install_github("jwijffels/RMOA", subdir="RMOA/pkg") ## ## HoeffdingTree example ## require(RMOA) hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver") hdt ## Define a stream - e.g. a stream based on a data.frame data(iris) iris <- factorise(iris) irisdatastream <- datastream_dataframe(data=iris) ## Train the HoeffdingTree on the iris dataset mymodel <- trainMOA(model = hdt, formula = Species ~ Sepal.Length + Sepal.Width + Petal.Length, data = irisdatastream) ## Predict using the HoeffdingTree on the iris dataset scores <- predict(mymodel, newdata=iris, type="response") table(scores, iris$Species) scores <- predict(mymodel, newdata=iris, type="votes") head(scores) ## ## Boosted set of HoeffdingTrees ## irisdatastream <- datastream_dataframe(data=iris) mymodel <- OzaBoost(baseLearner = "trees.HoeffdingTree", ensembleSize = 30) mymodel <- trainMOA(model = mymodel, formula = Species ~ Sepal.Length + Sepal.Width + Petal.Length, data = irisdatastream) ## Predict scores <- predict(mymodel, newdata=iris, type="response") table(scores, iris$Species) scores <- predict(mymodel, newdata=iris, type="votes") head(scores)
- Published on Thursday, 06 March 2014 12:59
Within 2 weeks on Thursday, March 20, The RBelgium R user group is holding its next Regular meeting in Brussels for which this is the schedule:
** Analysis and visualisation of climate data from the atmospheric model ALADIN using the Rfa package! (Rozemien De Troch - Onderzoeksdepartement KMI)
For more information about the event follow this link. Feel free to join.
- Published on Thursday, 14 November 2013 13:41
Advanced R programming topics
Similarly as last year, BNOSAC is offering the short course on 'Advanced R programming topics' at the Leuven Statistics Research Center (Belgium).
The course is now part of FLAMES (Flanders Training Network for Methodology and Statistics) and can be found here http://www.flames-statistics.eu/training/advanced-r-programming-topics. Subscription is no longer possible unless you ask kindly to LStat.
RApache and developing web applications with R as backend
As the demand of courses on R is increasing, we are thinking also about giving a course on RApache and developing web applications with R as a backend. This course will allow you to build applications like this one http://rweb.stat.ucla.edu/lme4/ or this one http://rweb.stat.ucla.edu/ggplot2/.