Python

Why learn python? Because it packs a powerful punch. Python is is easy to learn, user-friendly, highly extensible and overall a very powerful language. Like Java, it is fully object-oriented and it is as fast as C++. It allows scripting. I find that development in Python is more rapid than C++ or Java.

Python is free and fully supported by Linux, Windows and MacOS. There is strong community support and there are thousands of packages to extend the language.

Who is using python? Google. Need I say more?

Getting Started

To install on Linux:

$ yum install python

or

$ sudo apt-get python

or

install from source. To install from source, download python from <a target="_blank" href="http://www.python.org/">http://www.python.org/</a>

To test if the installation worked, type python on the terminal. You would get the following prompt:

Python 2.4.3 (#1, Jan 21 2009, 01:10:13)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

type 2+2 after the >>>

>>> 2 + 2

If you get 4 as the response, python is installed and working properly

To install on windows: Download python from http://www.python.org/ and double click to install. When installation is complete, set the system path to the installation directory.

  1. Click on Start > Control Panel > System > Change Settings > Advanced > Environment Variables
  2. Choose Path from bottom window
  3. Click on Edit
  4. append ;c:\Python26 to the end of the line (assuming python installed at c:\Python26)

To test the installation, open the command prompt. If you do not see command prompt in the list of installed programs, search for cmd or click on start > run and type cmd.

  1. type python in the command prompt
  2. when you get a message followed by >>>, type 2+2
  3. if you get 4 as the response, python is installed and working properly

If you just type python on the command line, you get >>> which allows you to run your scripts immediately.

If you need to run large programs, type them with a text editor such as notepad++, sublimeTxt, vi, gedit, emacs that does not add formatting to your text and simply save the file with a .py extension. They type python followed by the your python file name.

A few words about Python Syntax

  1. # is comment
  2. a value can be assigned to several variables simultaneously
  3. variables must be defined (a value assigned)
  4. PEMDAS rule Applies
  5. integers are converted to float in mixed float and integer operations
  6. int(), float(), long() are conversion functions

Python Example 1

# calculates volume of a box
# simultaneously value assignment
length = height = width = 5
volume = length * height * width
print volume        # 125
# integer + float = float
print volume + 0.0    # 125.0

Python Example 2

# PEMDAS 
print (2+2*2)/3

Python Example 3

a = 2
b = 3.0
print float(a)    # 2.0
print int(b)    # 3

Python Strings

  • Python supports both single and double quotes
  • backslash () is used to escape characters
  • when in the last position, backslash is a string continuation character
  • \n is the newline character, \t is the tab character
  • Strings surrounded with """ or ''' do not need to be escaped
  • + concatenates string
  • * multiplies string
  • string_name[] is used for substring
  • len() returns string length
  • u creates a Unicode string
  • r creates a raw string i.e. ignores \n, \r, etc.

Python String Example 1

print 'my code'            # my code
print 'Mike\'s code'    # Mike's code
print "Mike\'s code"    # Mike's code
# "Veni, Vidi, Vici", Julius Cesar
print '"Veni, Vidi, Vici", Julius Cesar'
# "Veni, Vidi, Vici", Julius Cesar
print "\"Veni, Vidi, Vici\", Julius Cesar"

Python String Example 2

print """
Veni, Vidi, Vici
- Julius Cesar
"I came, I saw, I conquered"
"""

Python String Example 3

s = "Dividing a very \
long line.\n\
Another line."
print s

Python String Example 4

# Julius Cesar
print 'Julius ' + 'Cesar'
# Muslims have to say 'I do' three times to get married
print 'I do! ' * 3

Python String Example 5

s = 'Veni, Vidi, Vici'
# fourth character, index starts with 0
print s[3]
# start from first character, print 4 characters
print s[0:4]
# start from sixth character, print till end
print s[6:]
# print the first 4 characters
print s[:4]
# print the last character
print s[-1]

Python String Example 6

s = 'Veni, Vidi, Vici'
print len(s)    # 16

Python Lists

  • Lists can store compound data types
  • List index starts at 0
  • The last index is -1
  • Elements of the list can be accessed in reverse e.g. using index -3

Python Lists Example 1

a = ['zero', 1, 'two', 3]
print a        # ['zero', 1, 'two', 3]

Python Lists Example 1

a = ['zero', 1, 'two', 3]
print a        # ['zero', 1, 'two', 3]
print a[-1]        # 3
print a[:2]        # ['zero', 1]
a += ['four']    # add an element to the list
print a
a[2] = 2            # change the value of an element
print a                
a[:] = []            # delete all elements in the list
print a

Python Lists Example 2

a = ['zero', 1, 'two', 3]
print len(a)    # gives the number of elements in the list

Python Conditional Statements

  • indentation replace curly braces used in C/C++, Java
  • elif is short for else if
  • you can use < (less than), > (greater than), == (equal to), <= (less than or equal to), >= (greater than or equal to) and != (not equal to) in your conditions

Python Conditional Statements Example 1

x = 1
if x > 0:
   print 'positive'

Python Conditional Statements Example 2

x = 1
if x < 0:
   print 'negative'
else:
   print 'positive'

Python Conditional Statements Example 3

x = 0
if x < 0:
   print 'negative'
elif x == 0:
   print 'zero'
else:
   print 'positive'

Python Iteration Statements

  • for is ideal for lists
  • break breaks out of the smallest block
  • continue jumps to the next iteration

Python Iteration Statements Example 1

for i in range(10):
   print i

Python Conditional Statements Example 2

for i in [0,1,2,3,4,5,6,7,8,9]:
   print i

Python Conditional Statements Example 3

a = ['think', 'try']
for x in a:
   print x + 'ing'
# prints thinking and trying

Python Conditional Statements Example 4

# print odd numbers between 1 and 10
for i in range(10):
   if i % 2 == 0:
       continue
   print i

Python Conditional Statements Example 5

# finds the number 5 in the list
x = 5
for i in range(10):
   print i
   if i == x:
       print 'found it'
       break

Defining Functions in Python

  • def keyword is used to define a function
  • all variables in a function are local unless defined as global with the keyword global
  • return statement returns the value form the function
  • default values can be defined in function parameters

Defining Python Functions Example 1

# return the maximum value
def maxx(m,n):
   if m > n:
       return m
   else: 
       return n

print maxx(12,45)

Defining Python Functions Example 2

# compute circumference
def circumference(r, pi = 3.14):
   return 2 * pi * r;

# function called with both parameters
print circumference(12,3.1)        # 74.4

# function called with one parameter, pi would assume default value
print circumference(10)                # 62.8

Python Stacks and Queues

In Python, lists can be used as stacks and queues. Stacks are like a box of pringles; the last chip to be placed inside the box is the first one to be taken out. This is called Last In First Out (LIFO). A queue is like the line up at the bus stop. The first person to get in the line is the first person to get on the bus. This is called First In First Out (FIFO).

Python Stacks

# list a is the stack
a = [1,2,3]
a.append(4)
print a
print a.pop()        # 4
print a.pop()        # 3

Python Queues

from collections import deque
queue = deque([1,2,3])
queue.append(4)                    # deque([1,2,3,4])
queue.append(5)                    # deque([1,2,3,4,5])
print queue.popleft()            # 1
print queue                        # deque([2,3,4,5])

Working with Biological Sequences

Opening a FASTA file

fp = file('a.fasta')
a = fp.readlines()
fp.close()
print a

output

['>gi|88853329|emb|AJ628425.1| Fasciola gigantica ITS1, isolate FgGZB2\n',
'ACCTGAAAATCTACTCTTACACAAGCGATACACGTGTGACCGTCATGTCATGCGATAAAAATTTGCGGAC\n',
'GGCTATGCCTGGCTCATTGAGGTCACAGCATATCCGATCACTGATGGGGTGCCTACCTGTATGATACTCC\n',
'GATGGTATGCTTGCGTCTCTCGGGGCGCTTGTCCAAGCCAGGAGAACGGGTTGTACTGCCATGATTGGTA\n',
'GTGCTAGGCTTAAAGAGGAGATTTGGGCTACGGCCCTGCTCCCGCCCTATGAACTGTTTCATTACTACAA\n',
'TTACACTGTTAAAGTGGTATTGAATGGCTTGCCATTCTTTGCCATTGCCCTCGCATGCACCCGGTCCTTG\n',
'TGGCTGGACTGCACGTACGTCGCCCGGCGGTGCCTATCCCGGGTTGGACTGATAACCTGGTCTTTGACCA\n', 'TA']

Extracting Sequence from FASTA File

# open fasta file - alternate form of the previous example
a = file('a.fasta').readlines()
# remove \n and join all lines except the first
seq = ''.join(a[1:])
seq = seq.replace('\n','')
print seq

output

ACCTGAAAATCTACTCTTACACAAGCGATACACGTGTGACCGTCATGTCATGCGATAAAAATTTGCGGAC
GGCTATGCCTGGCTCATTGAGGTCACAGCATATCCGATCACTGATGGGGTGCCTACCTGTATGATACTCC
GATGGTATGCTTGCGTCTCTCGGGGCGCTTGTCCAAGCCAGGAGAACGGGTTGTACTGCCATGATTGGT
AGTGCTAGGCTTAAAGAGGAGATTTGGGCTACGGCCCTGCTCCCGCCCTATGAACTGTTTCATTACTACA
ATTACACTGTTAAAGTGGTATTGAATGGCTTGCCATTCTTTGCCATTGCCCTCGCATGCACCCGGTCCTTG
TGGCTGGACTGCACGTACGTCGCCCGGCGGTGCCTATCCCGGGTTGGACTGATAACCTGGTCTTTGACCATA

Extracting Sequence from a GenBank File

# read file
a = file('NC_001284.gbk').read()
# DNA starts a line after ORIGIN and ends a line before //
orgn = a.find('ORIGIN')
start = a.find('1', orgn)
end = a.find('//', orgn)
b = a[start:end].split('\n')
seq = ''
for i in b:
   subseq = i.split()
   seq += ''.join(subseq[1:])
print seq

run as:

python code.py > output.txt

Exercises

  1. Extract the header of a FASTA file
  2. Extract sequence from a file containing 5 FASTA sequences
  3. Convert a GenBank sequence to a FASTA file

Applying functions on a lists using filter, map, reduce

filter(function,sequence) Applies a function to every element in the sequence. Returns only when the item returns true.

def even_numbers(x):
   return x % 2 == 0

print "Even Numbers"
print filter(even_numbers, range(10,20))

output

Even Numbers
[10, 12, 14, 16, 18]

map(function, sequence) Applies a function to every element in the sequence and returns the results of the function for each element.

def circumference(r):
   a = 2 * 3.14 * r
   return int(a)

print "Gives Circumference"
print map(circumference, range(10,20))

output Gives Circumference

[62, 69, 75, 81, 87, 94, 100, 106, 113, 119]

reduce(function, sequence) returns a single value created sliding window operation on a list

def adder(a,b):
   return a + b

expenses = (546,675,897,57,4,87,454)
# total expenses
print reduce(adder, expenses)

output

2720

Python Sets

A set is an unordered collection which does not allow duplicate elements.

a = [1,2,3,3,4,'yes','no']
print a
# [1, 2, 3, 3, 4, 'yes', 'no']
print set(a)
# set([1, 2, 3, 4, 'yes', 'no'])

The number 3 appeared only once in the set. Searching for items in set is also easy:

a = [1,2,3,3,4,'yes','no']
print set(a)
print 'yes' in a    # true
print 'n' in a        # false
print 2 in a        # true

Set Arithmetic

a = set('python')
b = set('php')
print b            # unique letters
# set(['p', 'h'])
print a - b        # difference
# set(['y', 't', 'o', 'n'])
print a | b        # union
# set(['p', 't', 'y', 'h', 'o', 'n'])
print a & b        # intersection
# set(['p', 'h'])
print a ^ b        # symetric difference
# set(['y', 't', 'o', 'n'])

Python Dictionaries

Dictionaries are called hashes or associative arrays in PHP and Perl. They are unordered set of key-value pairs. Lists have numerical indices. In a dictionary, the index is called a key and the key is always a string.

version = {'Python': 3, 'Perl': 6}

# add a key-value pair to the dictionary
version['PHP'] = 5
print version
# {'Python': 3, 'PHP': 5, 'Perl': 6}

# remove a key-value pair
del version['Perl']
# {'Python': 3, 'PHP': 5}

# print all the keys
print version.keys()
# ['Python', 'PHP']

# print all values
print version.values()
# [3, 5]

# does the key exist
print 'PHP' in version
# True

# looping through the dictionary
for k, v in version.iteritems():
    print 'I installed ', k, v
# I installed Python 3
# I installed PHP 5

# enumerating a dictionary
for i, v in enumerate(['Junk', 'Jan', 'Feb', 'Mar']):
   print i, v
# 0 Junk
# 1 Jan
# 2 Feb
# 3 Mar

# zip() function allows you to loop over multiple sequences
names = ['Alice', 'Bob', 'Carla']
dob = ['Jan 1, 2001', 'Feb 2, 2002', 'Mar 3, 2003']
lives = ['Australia', 'Belgium', 'Canada']
for a, b, c in zip(names, dob, lives):
   print  '{0} was born on {1} in {2}' . format(a, b, c)
# Alice was born on Jan 1, 2001 in Australia
# Bob was born on Feb 2, 2002 in Belgium
# Carla was born on Mar 3, 2003 in Canada

Python for Data Science

Python is the fastest growing technology in data science. Unlike R, Python is general purpose language so you can do more in Python than you can do in R such as create applications. Download and install Anaconda to get access to numerous pre-installed bundles and packages.

Technologies: