PEARC17 has ended
Once you’ve registered and arrive in New Orleans, be sure to use our mobile web app to manage your busy schedule so you don’t miss a thing. Also check the website for updates and use the #PEARC17 hashtag to keep up with friends and colleagues.  
Back To Schedule
Monday, July 10 • 1:30pm - 5:00pm
The Data Scientist’s Python Toolbox

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
This tutorial is an intermediate level course on tackling the problems facing data scientist using Python. Python is a high-level object oriented language that has found wide acceptance in the scientific computing/ data science community. Ease of use and an abundance of software packages are some of the few reasons for this extensive adoption. Pandas is a high-level open-source library that provides data analysis tools for Python. It provides an efficient and comprehensive platform for a large number of analytics problems. For generating sophisticated visualizations two packages: Seaborn and Plotly are introduced. While Seaborn is aimed at Statisticians, Plotly provides a rich, interactive visualization framework which is ideal for visualizing large data. Plotly also allows visualization-rich dashboards which can be shared online. To conclude, out-of-core computing with Dask/Blaze is introduced for those datasets that won’t quite fit into memory. The goal of dask is to “extend the size of convenient datasets from ‘fits in memory’ to ‘fits on disk’” effectively fitting between Pandas and PySpark in the Python ecosystem for analytics. Additional materials for tutorial are available here https://bitbucket.org/sjraj/pearc/downloads/
Prerequisites: participants should bring a laptop and have one of the following
a. Anaconda distribution with Python 3 installed. Anaconda is a Python distribution and can be downloaded from https://www.continuum.io/downloads
b. A VirtualBox installation.

Monday July 10, 2017 1:30pm - 5:00pm CDT
Strand 12B