Using R to improve data analyses Python workflows

From Gridkaschool
Revision as of 22:52, 29 August 2016 by MFischer (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Using R to improve data analyses Python workflows

Software Environment

The courses are implemented as Jupyter/IPython notebooks, running Python and R. We provide a VM with preconfigured Jupyter Notebook Server for each student.

You can also setup the software environment on your own computer.

Minimum Requirement

  • Current web browser, such as Firefox, Chrome or Safari.
    • You will receive a server address and password at the start of the course.

Running the Course on your own Computer

We have only tested the course on Linux and OSX! Installation on Windows may differ!

Required Software

  • Python3 (>= 3.4) - A sufficiently recent release of Python
    • Commonly available via package managers such as apt-get install python3 or brew install python3.
    • Also available from the Python Homepage
  • Python3 pip - Python Package Manager
    • Several package managers do not install the python package manger
    • It will be available as a separate package in this case, e.g. apt-get install python3-pip.
  • Jupyter - Evaluates and renders the notebooks
    • Available via pip: pip3 install jupyter
  • RISE (optional) - provides the interactive presentation view
    • Consult the RISE Readme
    • Available via pip: pip3 install RISE && jupyter-nbextension install rise --py --sys-prefix && jupyter-nbextension enable rise --py --sys-prefix
  • R (we are using 3.3.1, but you are also fine with older versions)
    • on unix-based systems you need to take care to compile R with the flag --enable-R-shlib (we do need this to get Rserve running)
  • RServe - R Compute Service
  • pyRserve - RServe client for Python
    • Availabe via pip: pip3 install pyRserve
  • PypeR - Pipe to an R subprocess
    • Available via pip: pip3 install PypeR
  • rpy2 - Low level bindings to R
    • Available via pip: pip3 install rpy2
  • numpy
    • Available via pip: pip3 install numpy
  • pandas
    • Available via pip: pip3 install pandas
  • Instal the R kernel for jupyter
    • From within R please execute:
    • Please ensure the following packages are installed: install.packages(c('devtools', 'ggplot2', 'dplyr', 'readr', 'magrittr'))
    • devtools::install_github('IRkernel/IRkernel') (this will install the RKernel for Jupyter)
    • IRkernel::installspec(user = FALSE)

Setting up the Environment

GKSDIR='~/gks2016' # change me

git clone https://bitbucket.org/teamkseta/gks_2016_pyr.git $GKSDIR/gks_lib

git clone https://bitbucket.org/teamkseta/gks_2016_r.git $GKSDIR/gks_2016_r

export PYTHONPATH="$GKSDIR/gks_lib:$PYTHONPATH"

jupyter-notebook

  • To also start Rserve you will need to execute:

R CMD Rserve

  • Make sure not to run this with root privileges! By default, Rserve listens on localhost:6311.