Getting started with data science in Python


Use the Anaconda package. It will make starting with Data Science way easier, since almost all necessary packages are included and you can start right away.

$ cd ~/Downloads
$ wget
$ bash
$ source ~/.bashrc
$ conda --version
$ conda update conda


Make your first Data Frame
#!/usr/bin/env python
import pandas as pd

df = pd.DataFrame({ 'A' : 1.,
                    'B' : pd.Timestamp('20130102'),
                    'C' : pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D' : pd.Series([1, 2, 1, 2], dtype='int32'),
                    'E' : pd.Categorical(["test", "train", "test", "train"]),
                    'F' : 'foo' })

Create your first plots

First update Seaborn

$ conda install seaborn

Next, create a plot of an example dataset

#!/usr/bin/env python
import seaborn as sns

# Load one of the data sets that come with seaborn
tips = sns.load_dataset("tips")

sns.jointplot("total_bill", "tip", tips, kind='reg');
sns.lmplot("total_bill", "tip", tips, col="smoker");