Getting started with data science in Python

Wed 03 August 2016

Installation

Use the Anaconda package. It will make starting with Data Science way easier, since almost all necessary packages are included and you can start right away.

$ cd ~/Downloads
$ wget http://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh
$ bash Anaconda2-4.1.1-Linux-x86_64.sh
$ source ~/.bashrc
$ conda --version
$ conda update conda

Examples

Make your first Data Frame

#!/usr/bin/env python
import pandas as pd

df = pd.DataFrame({ 'A' : 1.,
                    'B' : pd.Timestamp('20130102'),
                    'C' : pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D' : pd.Series([1, 2, 1, 2], dtype='int32'),
                    'E' : pd.Categorical(["test", "train", "test", "train"]),
                    'F' : 'foo' })

df.groupby('E').sum().D

Create your first plots

First update Seaborn

$ conda install seaborn

Next, create a plot of an example dataset

#!/usr/bin/env python
import seaborn as sns

# Load one of the data sets that come with seaborn
tips = sns.load_dataset("tips")
tips.head()

sns.jointplot("total_bill", "tip", tips, kind='reg');
sns.lmplot("total_bill", "tip", tips, col="smoker");

Source

anaconda datascience pandas seaborn