Using R to improve data analyses Python workflows
From Gridkaschool
Jump to navigationJump to search
Using R to improve data analyses Python workflows
Software Environment
The courses are implemented as Jupyter/IPython notebooks, running Python and R. We provide a VM with preconfigured Jupyter Notebook Server for each student.
You can also setup the software environment on your own computer.
Minimum Requirement
- Current web browser, such as Firefox, Chrome or Safari.
- You will receive a server address and password at the start of the course.
Running the Course on your own Computer
We have only tested the course on Linux and OSX! Installation on Windows may differ!
Required Software
- Python3 (>= 3.4) - A sufficiently recent release of Python
- Commonly available via package managers such as
apt-get install python3
orbrew install python3
. - Also available from the Python Homepage
- Commonly available via package managers such as
- Python3 pip - Python Package Manager
- Several package managers do not install the python package manger
- It will be available as a separate package in this case, e.g.
apt-get install python3-pip
.
- Jupyter - Evaluates and renders the notebooks
- Available via pip:
pip3 install jupyter
- Available via pip:
- RISE (optional) - provides the interactive presentation view
- Consult the RISE Readme
- Available via pip:
pip3 install RISE && jupyter-nbextension install rise --py --sys-prefix && jupyter-nbextension enable rise --py --sys-prefix
- R (we are using 3.3.1, but you are also fine with older versions)
- on unix-based systems you need to take care to compile R with the flag --enable-R-shlib (we do need this to get Rserve running)
- RServe - R Compute Service
wget https://www.rforge.net/Rserve/snapshot/Rserve_1.8-5.tar.gz --no-check-certificate
R CMD INSTALL Rserve_1.8-5.tar.gz
- pyRserve - RServe client for Python
- Availabe via pip:
pip3 install pyRserve
- Availabe via pip:
- PypeR - Pipe to an R subprocess
- Available via pip:
pip3 install PypeR
- Available via pip:
- rpy2 - Low level bindings to R
- Available via pip:
pip3 install rpy2
- Available via pip:
- numpy
- Available via pip:
pip3 install numpy
- Available via pip:
- pandas
- Available via pip:
pip3 install pandas
- Available via pip:
- Instal the R kernel for jupyter
- From within R please execute:
- Please ensure the following packages are installed:
install.packages(c('devtools', 'ggplot2', 'dplyr', 'readr', 'magrittr'))
devtools::install_github('IRkernel/IRkernel')
(this will install the RKernel for Jupyter)IRkernel::installspec(user = FALSE)
Setting up the Environment
- Check out the exercise repositories
https://bitbucket.org/teamkseta/gks_2016_pyr.git
andhttps://bitbucket.org/teamkseta/gks_2016_python.git
. - Set PYTHONPATH to include the
gks_2016_pyr
directory. - Change to the
gks_2016_python
directory and run the notebook server.
GKSDIR='~/gks2016' # change me
git clone https://bitbucket.org/teamkseta/gks_2016_pyr.git $GKSDIR/gks_lib
git clone https://bitbucket.org/teamkseta/gks_2016_r.git $GKSDIR/gks_2016_r
export PYTHONPATH="$GKSDIR/gks_lib:$PYTHONPATH"
jupyter-notebook
- To also start Rserve you will need to execute:
R CMD Rserve
- Make sure not to run this with root privileges! By default, Rserve listens on localhost:6311.