binsegRcpp Efficient implementation of the binary segmentation heuristic algorithm for changepoint detection, using C++ std::multiset. Also contains functions for comparing empirical time complexity to best/worst case.
| tests | |
| coverage |
install.packages("binsegRcpp") ##ORif(require("remotes"))install.packages("remotes") remotes::install_github("tdhock/binsegRcpp")The main function is binseg for which you must at least specify the first two arguments:
distribution.strspecifies the loss function to minimize.data.vecis a numeric vector of data to segment.
>x<- c(0.1, 0, 1, 1.1, 0.1, 0) > (models.dt<-binsegRcpp::binseg("mean_norm", x)) binarysegmentationmodel:segmentsendlossvalidation.loss<int><int><num><num>1:161.348333e+0002:241.015000e+0003:321.500000e-0204:431.000000e-0205:555.000000e-0306:61-3.339343e-160The result above summarizes the data that are computed during the binary segmentation algorithm. It has a special class with dedicated methods:
> class(models.dt) [1] "binsegRcpp""list"> methods(class="binsegRcpp") [1] coefplotprintsee'?methods'foraccessinghelpandsourcecodeThe coef methods returns a data table of segment means:
> coef(models.dt, segments=2:3) segmentsstartendstart.posend.posmean<int><int><int><num><num><num>1:2140.54.50.552:2564.56.50.053:3120.52.50.054:3342.54.51.055:3564.56.50.05Demo of poisson loss and non-uniform weights:
>data.vec<- c(3,4,10,20) > (fit1<-binsegRcpp::binseg("poisson", data.vec, weight.vec=c(1,1,1,10))) binarysegmentationmodel:segmentsendlossvalidation.loss<int><int><num><num>1:14-393.843702:23-411.634703:32-413.941604:41-414.01330Demo of change in mean and variance for normal distribution:
>sim<-function(mu,sigma)rnorm(10000,mu,sigma) > set.seed(1) >data.vec<- c(sim(5,1), sim(0, 5)) >fit<-binsegRcpp::binseg("meanvar_norm", data.vec) > coef(fit, 2L) segmentsstartendstart.posend.posmeanvar<int><int><int><num><num><num><num>1:21100000.510000.54.993462961.0247632:2100012000010000.520000.5-0.0209503324.538556Other implementations of binary segmentation include changepoint::cpt.mean(method=”BinSeg”) (quadratic storage in max number of segments), BinSeg::BinSegModel() (same linear storage as binsegRcpp), and ruptures.Binseg() (unknown storage). Figures comparing the timings.
This version uses the Rcpp/.Call interface whereas the binseg package uses the .C interface.
See branches for variations of the interface to use as test cases in RcppDeepState development.