crfsuite for natural language processing

A new R package called crfsuite supported by BNOSAC landed safely on CRAN last week. The crfsuite package (https://github.com/bnosac/crfsuite) is an R package specific to Natural Language Processing and allows you to easily build and apply models for

  • named entity recognition
  • text chunking
  • part of speech tagging
  • intent recognition or
  • classification of any category you have in mind

The focus of the implementation is on allowing the R user to build such models on his/her own data, with your own categories. The R package is a Rcpp interface to the popular crfsuite C++ package which is used a lot in all kinds of chatbots.

In order to facilitate creating training data on your own data, a shiny app is made available in this R package which allows you to easily tag your own chunks of text, with your own categories, which can next be used to build a crfsuite model. The package also plays nicely together with the udpipe R package (https://CRAN.R-project.org/package=udpipe), which you need in order to extract predictive features (e.g. parts of speech tags) for your words to be used in the crfsuite model.

On a side-note. If you are in the area of NLP, you might also be interested in the upcoming ruimtehol R package which is a wrapper around the excellent StarSpace C++ code providing word/sentence/document embeddings, text-based classification, content-based recommendation and similarities as well as entity relationship completion.

app screenshot

You can get going with the crfsuite package as follows. Have a look at the package vignette, it shows you how to construct and apply your own crfsuite model.

## Install the packages
install.packages("crfsuite")
install.packages("udpipe")

## Look at the vignette
library(crfsuite)
library(udpipe)
vignette("crfsuite-nlp", package = "crfsuite")

More details at the development repository https://github.com/bnosac/crfsuite where you can also provide feedback.

Training on Text Mining 

Are you interested in how text mining techniques work, then you might be interested in the following data science courses that are held in the coming months.rtraining

  • 20-21/11/2018: Text mining with R. Leuven (Belgium). Subscribe here
  • 19-20/12/2018: Applied spatial modelling with R. Leuven (Belgium). Subscribe here
  • 21-22/02/2018: Advanced R programming. Leuven (Belgium). Subscribe here
  • 13-14/03/2018: Computer Vision with R and Python. Leuven (Belgium). Subscribe here
  •      15/03/2019: Image Recognition with R and Python: Subscribe here
  • 01-02/04/2019: Text Mining with R. Leuven (Belgium). Subscribe here