Difference between revisions of "Using R to improve data analyses Python workflows"

From Gridkaschool
Line 67: Line 67:
 
jupyter-notebook
 
jupyter-notebook
 
</code>
 
</code>
  +
  +
To also start Rserve you will need to execute:
  +
  +
<code>
  +
R CMD Rserve
  +
</code>
  +
  +
Please ensure not to run this with root privileges! By default, Rserve listens on localhost:6311.

Revision as of 17:08, 29 August 2016

Using R to improve data analyses Python workflows

Software Environment

The courses are implemented as Jupyter/IPython notebooks, running Python and R. We provide a VM with preconfigured Jupyter Notebook Server for each student.

You can also setup the software environment on your own computer.

Minimum Requirement

  • Current web browser, such as Firefox, Chrome or Safari.
    • You will receive a server address and password at the start of the course.

Running the Course on your own Computer

We have only tested the course on Linux and OSX! Installation on Windows may differ!

Required Software

  • Python3 (>= 3.4) - A sufficiently recent release of Python
    • Commonly available via package managers such as apt-get install python3 or brew install python3.
    • Also available from the Python Homepage
  • Python3 pip - Python Package Manager
    • Several package managers do not install the python package manger
    • It will be available as a separate package in this case, e.g. apt-get install python3-pip.
  • Jupyter - Evaluates and renders the notebooks
    • Available via pip: pip3 install jupyter
  • RISE (optional) - provides the interactive presentation view
    • Consult the RISE Readme
    • Available via pip: pip3 install RISE && jupyter-nbextension install rise --py --sys-prefix && jupyter-nbextension enable rise --py --sys-prefix
  • R (we are using 3.3.1, but you are also fine with older versions)
    • on unix-based systems you need to take care to compile R with the flag --enable-R-shlib (we do need this to get Rserve running)
  • RServe
  • pyRserve
    • Availabe via pip: pip3 install pyRserve
  • PypeR
    • Available via pip: pip3 install PypeR
  • rpy2
    • Available via pip: pip3 install rpy2
  • numpy
    • Available via pip: pip3 install numpy
  • pandas
    • Available via pip: pip3 install pandas
  • From within R please also execute
    • Please ensure the following packages are installed: devtools, ggplot2, dplyr
    • devtools::install_github('IRkernel/IRkernel') (this will install the RKernel for Jupyter)
    • IRkernel::installspec(user = FALSE)

Setting up the Environment

GKSDIR='~/gks2016' # change me

git clone https://bitbucket.org/teamkseta/gks_2016_pyr.git $GKSDIR/gks_lib

git clone https://bitbucket.org/teamkseta/gks_2016_r.git $GKSDIR/gks_2016_r

export PYTHONPATH="$GKSDIR/gks_lib:$PYTHONPATH"

jupyter-notebook

To also start Rserve you will need to execute:

R CMD Rserve

Please ensure not to run this with root privileges! By default, Rserve listens on localhost:6311.