Pandas

Python library for data manipulation and analysis. It can work with many different data types including:

  • symbol separated data (tsv, csv, etc.)
  • ordered and unordered time series data
  • matrix and table data
  • labelled and unlabelled data

Reading tab delimited data

First create a data file. Open a text file 3 lines of text with three words on each line separated by tabs. Then save it as data.tsv. Something like the following

id    name    dob
11    Alice    January
12    Bob    February

code

import pandas as pd
mydata = pd.read_table('data.tsv');
# for remote data
# pd.read_table('http://...')
mydata.head()    # head() shows the first 5 rows

Reading character delimited files

Sample data. Save as data.txt

11|Alice|January
12|Bob|February

Code

import pandas as pd
cols = ['id', 'age', 'dob']
mydata = pd.read_table('data.txt', sep='|', header=None, names=cols)

Note that in the sample data, there is no header row. So we created one.