Apache Spark

Spark a distributed computing platform that is built on top of Hadoop MapReduce. It extends the MapReduce model and make it easier to code and faster to execute.

Spark provides API in Java, Scala, and Python. Any of these languages can be used to create Spark applications.

Spark supports, Map, Reduce, SQL queries, streaming data, machine learning algorithms, and graph algorithms.

Spark stack contains:

SQL Joins

The join clause allows us to combine SQL tables. Tables can be combined in different ways. Think on inner join as an intersection of two tables and an outer join as a union of two tables. This table explains different types of joins.

We will be using the following two tables in the examples:

Planning data science project

When planning for a data science project, you need to spend to seek clarification about the project. The important questions are:

  • What are the goals of the project?
  • What resources are required for this project?
    • data
    • software
    • personnel
  • Are there any important deadlines?
  • Who are the stakeholders?

Once you have sufficiently gained clarity on your project:


NumPy is a python library that makes it easy to perform mathematical and logical operations on arrays. It provides high performance n-dimensional array object and tools to work with the arrays.

Why use numpy instead of a python list

  1. Takes less memory
  2. Faster than python lists
  3. More powerful
  4. Easier to use

What is my numpy version

import numpy as np
print (np.__version__)


Error handling in Python3

In Python, you can handle errors with exceptions. The following code will generate an FileNotFound error because the file it is trying to open does not exist:

fh = open('data.txt')
for strline in fh.readlines():


FileNotFoundError: [Errno 2] No such file or directory: 'data.txt'

An error is generated and the program stop executing. Suppose, we want to program to continue because we will compensate for this error later in the code. We can use try: except:

Object-Oriented Python3

Python is a fully object-oriented programming language but it can also be used for scripting. It is assumed that you are already familiar with object-orientation concepts and have experience writing object-oriented code in another language. This page will show how to write object-oriented code in Python without explaining object-oriented concepts.

The following class computes the area of a rectangle. Classes are defined with class keyword. init is the constructor. length and width are initialized in the constructor. Area is calculated by calcArea().

jQuery quick start

Before learning jQuery, you should learn HTML, CSS, JavaScript, and learn how to debug JavaScript. You should also be familiar with JSON. Knowing XML will also be helpful.