data science with ruby

Practical Data Science with Ruby based tools.

709
51
Ruby

[RubyNLP |
RubyML |
RubyInterop]

Awesome Data Science with Ruby Awesome

Links and Resources for Data Processing and Analysis in Ruby

Data Science is a new
“sexy” buzzword without specific meaning but often used to substitute
Statistics, Scientific Computing, Text and Data Mining and
Visualization, Machine Learning, Data Processing and Warehousing as
well as Retrieval Algorithms of any kind.

This curated list comprises awesome tutorials, libraries,
information sources about various Data Science applications using
the Ruby programming language.

A lot of useful resources on this list come from the development by
The Ruby Science Foundation, our contributors and
our own day to day work on various data intensive applications.
Read why this list is awesome.

✨ Every contribution is welcome!
Add links through pull requests or create an issue to start a discussion.

Follow us on Twitter
and please spread the word using the #RubyDataScience hash tag!

Contents

Ruby vs. Python vs. Julia vs. R

Ruby Python Julia R
Daru / Rover Pandas
NArray NumPy

Standing on the shoulders of giants

Ruby is (for now) not a Data Science centric language with a very large established library.
Leveraging libraries from R, Python, and Julia helps Ruby to solve your tasks!

Data Manipulation

  • kiba
    lightweight Ruby ETL (Extract-Transform-Load) framework.
  • jongleur
    Workflow manager using DAG definitions to execute ETL tasks.

Distributed Computing

Data Structures

  • daru
    Data Frame and Vector structures with comprehensive manipulating and visualization methods.
  • Rover
    Data Frame and Vector structures with comprehensive manipulating and visualization methods.
  • numo-narray
    n-dimensional Numerical Array for Ruby.
  • nmatrix
    dense and sparse linear algebra library for Ruby via SciRuby.
  • kdtree
    blazingly fast native 2d k-d tree.
  • mdarray
    Array structure for JRuby.
  • spreadsheet
    manipulation library for MS Excel spreadsheets.
  • networkx
    Ruby based NetworkX clone that handles various
    usecases of the Graph Data Structure.
  • cumo
    CUDA-aware numerical Array library with NArray similar interface.

Data sets

  • rdatasets
    Data sets available in R via Rdatasets.
  • red-datasets
    Growing collection of publicly available data sets such as CIFAR-10, Iris, MNIST etc.

Statistics

  • rb-gsl
    Ruby interface to the GNU Scientific Library. [dep: GLS]
  • simple_stats
    Enumerable patches for descriptive statistics.
  • enumerable-statistics
    fast implementation of descriptive statistics for the Enumerable module.
  • statsample
    basic and advanced statistics for Ruby. [dep: GLS]
  • statsample-glm
    extension of statsample by Generalized Linear Models.
  • statsample-bivariate-extension
    extension of statsample by Bivariate Correlations.
  • statsample-timeseries
    extension of statsample by Time Series estimators.
  • pca
    Principal Component Analysis (PCA) in Ruby.
  • descriptive-statistics
    descriptive extensions for the Enumerable module or standalone usage.
  • distribution
    probabilistic distributions and descriptive measures for them.
  • statistics2
    Normal, Chi-square, t- and F- probability distributions for Ruby.
  • fast_statistics
    fast computation of descriptive statistics (min, max, mean, median, 1st and 3rd quartiles, population standard deviation) for a multivariate dataset.

Numeric and Symbolic Computation

Visualization

Comprehensive tools for Data Visualization.

Interactive Computing

Input and Output

General formats

Database Adapters

  • pg
  • Mongo
  • MySQL

Domain specific formats

  • BibTeX
  • inih — fast C based INI parser for Ruby.
  • bolognese
    conversion tool for citation formats like BibTeX, RIS, or Crossref XML.

Provisioning Infrastructure

Machine Learning

Please look at our extensive Awesome ML with Ruby list.

Articles, Posts, Talks, and Presentations

Community

Related resources

Wait but why?

There are a lot of software lists with tools related to the Data Science.
There are a couple of lists with Ruby related projects. There are no lists of
only working and tested software with documented scope. We’ll try to make one!

What is awesome? Awesome are documented, maintained and focused tools.

Can something turn not awesome at a point? Yes! Abandoned projects with broken
dependencies aren’t awesome any more! They leave this list.

License

Creative Commons Zero 1.0 Awesome Data Science with Ruby by Andrei Beliankou and
Contributors.

To the extent possible under law, the person who associated CC0 with
Awesome Data Science with Ruby has waived all copyright and related or neighboring rights
to Awesome Data Science with Ruby.

You should have received a copy of the CC0 legalcode along with this
work. If not, see https://creativecommons.org/publicdomain/zero/1.0/.