CRAN Task View: High-Performance Computing with R
Maintainer: Dirk Eddelbuettel
Date: 2024-11-24
This CRAN Task View contains a list of packages, grouped by topic, that are useful for
high-performance computing (HPC) with R. In this context, we are defining ‘high-performance
computing’ rather loosely as just about anything related to pushing R a little further: using
compiled code, parallel computing (in both explicit and implicit modes), working with large objects
as well as profiling.
Unless otherwise mentioned, all packages presented with hyperlinks are available from the
Comprehensive R Archive Network (CRAN).
Several of the areas discussed in this Task View are undergoing rapid change. Please send
suggestions for additions and extensions for this task view via e-mail to the maintainer or submit
an issue or pull request in the GitHub repository linked above. See the Contributing
page in the CRAN Task
Views repo for details.
Suggestions and corrections by Achim Zeileis, Markus Schmidberger, Martin Morgan, Max Kuhn, Tomas
Radivoyevitch, Jochen Knaus, Tobias Verbeke, Hao Yu, David Rosenberg, Marco Enea, Ivo Welch, Jay
Emerson, Wei-Chen Chen, Bill Cleveland, Ross Boylan, Ramon Diaz-Uriarte, Mark Zeligman, Kevin Ushey,
Graham Jeffries, Will Landau, Tim Flutre, Reza Mohammadi, Ralf Stubner, Bob Jansen, Matt Fidler,
Brent Brewington and Ben Bolder (as well as others I may have forgotten to add here) are gratefully
acknowledged.
The ctv
package supports these Task Views. Its functions install.views
and update.views
allow,
respectively, installation or update of packages from a given Task View; the option coreOnly
can
restrict operations to packages labeled as core below.
Direct support in R started with release 2.14.0 which includes a new package parallel
incorporating (slightly revised) copies of packages multicore and r pkg("snow", priority = "core")
. Some types of clusters are not handled directly by the base package ‘parallel’. However,
and as explained in the package vignette, the parts of parallel which provide r pkg("snow")
-like
functions will accept r pkg("snow")
clusters including MPI clusters. Use vignette("parallel", package="parallel")
to view the package vignette. The parallel package also contains support
for multiple RNG streams following L’Ecuyer et al (2002), with support for both mclapply and snow
clusters.\ The version released for R 2.14.0 contains base functionality: higher-level convenience
functions are planned for later R releases.
r pkg("Rmpi", priority = "core")
by Yu. r pkg("Rmpi")
package is mature yet actively maintained and offers access tor pkg("Rmpi")
can be used with the LAM/MPI, MPICH / MPICH2, Open MPI, and Deino MPIr pkg("pbdMPI")
package provides S4 classes to directly interface MPI in order to supportr pkg("snow")
(Simple Network of Workstations) package by Tierney et al. can use PVM, MPI,r pkg("snowFT")
package provides fault-tolerance extensions tor pkg("snow")
.r pkg("snowfall")
package by Knaus provides a more recent alternative tor pkg("snow")
. Functions can be used in sequential or parallel mode.r pkg("parallelly")
package enhances the parallel package by giving additional controlr pkg("foreach")
package allows general iteration over elements in a collection withoutr pkg("doMC")
(usingr pkg("doSNOW")
(using r pkg("snow")
, seer pkg("doMPI")
(using r pkg("Rmpi")
) packages, and r pkg("doFuture")
(usingr pkg("future")
) packages.r pkg("future")
package allows for synchronous (sequential) and asynchronous (parallel)r pkg("future.apply")
for parallel versions of base-R apply functions, andr pkg("furrr")
for parallel versions of purrr fuctions. Parallelization is available throughr pkg("future.callr")
via the callr package, andr pkg("future.batchtools")
via the batchtools package.r pkg("Rborist")
package employs OpenMP pragmas to exploit predictor-level parallelism inr pkg("h2o")
package connects to the h2o open source machine learning environment whichr pkg("randomForestSRC")
package can use both OpenMP as well as MPI for random forestr pkg("parSim")
package can perform simulation studies using one or multiple cores, bothr pkg("qsub")
package can submit commands to run on gridengine clusters.r pkg("mirai")
package is a minimalist framework for local or distributed asynchronousr pkg("nanonext")
NNG C messaging library binding. The r pkg("crew")
r pkg("mirai")
with auto-scaling, a central manager, and plugin system forr pkg("condor")
package can interact with Condor HPC installations via ssh
to transferr gcode("romp")
. An R-Forge project r rforge("romp")
was initiated but there isr pkg("RhpcBLASctl")
package detects the number of available BLAS cores, and permitsr pkg("targets")
package and its predecessor r pkg("drake")
are R-focused pipelinefuture
workers.r pkg("flexiblas")
package manages BLAS/LAPACK libraries by loading and possibly switchingr rforge("biocep-distrib")
project by Chine offers a Java-based framework for local, Grid,r github("saptarshiguha/RHIPE")
package, started by Saptarshi Guha, provides an interfacer pkg("RProtoBuf")
package provides an interface to Google’s language-neutral,r pkg("rlecuyer")
r pkg("rstream")
package, the r pkg("sitmo")
package as well as ther pkg("dqrng")
package.r pkg("doRNG")
package provides functions to perform reproducible parallel foreach loops,r pkg("rslurm")
package. (r pkg("snowfall")
. (r pkg("batch")
package by Hoffmann can launch parallel computing requests onto a clusterr pkg("BatchJobs")
package provides Map, Reduce and Filter variants to manage R jobs andr pkg("BatchExperiments")
package extends it with anr pkg("batchtools")
is ar pkg("clustermq")
package sends function calls as jobs on LSF, SGE and SLURM via a singler pkg("caret")
package by Kuhn can use various frameworks (MPI, NWS etc) to parallelizedr bioc("maanova")
package on Bioconductor by Wu can use r pkg("snow")
andr pkg("Rmpi")
for the analysis of micro-array experiments.r pkg("pvclust")
package by Suzuki and Shimodaira can use r pkg("snow")
andr pkg("Rmpi")
for hierarchical clustering via multiscale bootstraps.r pkg("tm")
package by Feinerer can use r pkg("snow")
and r pkg("Rmpi")
forr pkg("varSelRF")
package by Diaz-Uriarte can use r pkg("snow")
and r pkg("Rmpi")
forr bioc("multtest")
package by Pollard et al. on Bioconductor can use r pkg("snow")
,r pkg("Rmpi")
or rpvm for resampling-based testing of multiple hypothesis.r pkg("Matching")
package by Sekhon for multivariate and propensity score matching,r pkg("bnlearn")
package by Scutari for bayesian network structure learning,r pkg("latentnet")
package by Krivitsky and Handcock for latent position and cluster models,r pkg("peperr")
package by Porzelius and Binder for parallelisedr pkg("orloca")
package by Fernandez-Palacin and Munoz-Marquez for operations research locational analysis,r pkg("rgenoud")
package by Mebane and Sekhon for genetic optimization using derivatives, ther bioc("affyPara")
package by Schmidberger, Vicedo and Mansmann for parallel normalization ofr bioc("puma")
package by Pearson et al. which propagatesr pkg("snow")
for parallelized operations using either one of the MPI, PVM, NWS or socketr pkg("snow")
.r gcode("bugsparallel")
package uses r pkg("Rmpi")
for distributed computing of multipler pkg("xgboost")
package by Chen et al. is an optimized distributed gradient boostingr pkg("dclone")
package provides a global optimization approach and a variant of simulatedr pkg("snow")
package.r pkg("pls")
.r pkg("pbapply")
package offers a progress bar for vectorized R functions in the \*apply
r pkg("Sim.DiffProc")
package simulates and estimates multidimensional Itô andr pkg("keras")
package by by Allaire et al. provides a high-level neural networks API. Itr pkg("mvnfast")
uses the sumo random number generator to generate multivariate and normalr pkg("rxode2")
uses parallel processing (via OpenMP
) for faster solving of ordinaryID
) and can generate randomr pkg("nlmixr2")
uses parallel ODE solving from rxode2
to solve nonlinear mixed effects"saem"
).r pkg("gcbd")
package implements a benchmarking framework for BLAS and GPUs.r pkg("OpenCL")
package provides an interface from R to OpenCL permitting hardware- andr pkg("tensorflow")
package by by Allaire et al. provides access to the completer pkg("tfestimators")
package by by Tang et al. offers a high-level API that providesr pkg("BDgraph")
package provides statistical tools for Bayesian structure learning inr pkg("ssgraph")
package offers Bayesian inference in undirected graphical models usingr pkg("GPUmatrix")
package can offload calculations to the GPU while providing the API ofMatrix
package.r pkg("biglm")
package by Lumley uses incremental computations to offer lm()
and glm()
r pkg("ff")
package by Adler et al. offers file-based access to data sets that are toor pkg("bigmemory")
package by Kane and Emerson permits storing large objects such asr pkg("sqldf")
byr pkg("data.table")
by Dowle) are also of potential interest but not reviewedr pkg("MonetDB.R")
package allows R to access the MonetDB column-oriented, open sourcer pkg("LaF")
package provides methods for fast access to large ASCII files in csv orr pkg("bigstatsr")
package also operates on file-backed large matrices via memory-mappedr pkg("disk.frame")
package leverages several other packages to provide efficient accessr pkg("arrow")
package offers the portable Apache Arrow in-memory format as well asr pkg("inline")
package by Sklyar et al eases adding code in C, C++ or Fortran to R. Itr pkg("Rcpp")
package by Eddelbuettel and Francois offers a number of C++ classes thatr pkg("RInside")
r pkg("RcppParallel")
package by Allaire et al. bundles the Intel Threading Buildingr pkg("Rcpp")
,r pkg("rJava")
package by Urbanek provides a low-level interface to Java similar to the.Call()
interface for C and C++.r pkg("reticulate")
package by Allaire provides interface to Python modules, classes, andr pkg("tensorflow")
and r pkg("tfestimators")
within R.Packages r pkg("profvis")
, r pkg("proffer")
, r pkg("profmem")
, r pkg("GUIProfiler")
,
r pkg("proftools")
, and r pkg("aprof")
summarize and visualize output from the Rprof
interface
for profiling. The r pkg("profile")
package reads and writes profiling data and converts among
file formats such as pprof
by Google and Rprof
. The
xrprof
command-line tool implements profile sampling for a
given R process on Linux or Windows, and it can profile R code alongside compiled code.