The VirtualBox VM was created with a user that doesn't match the current user running Vagrant

Error Message

The VirtualBox VM was created with a user that doesn't match the current user running Vagrant. VirtualBox requires that the same user be used to manage the VM that was created. Please re-run Vagrant with that user. This is not a Vagrant issue.

The UID used to create the VM was: 504 Your UID is: 502

Cause

You used a different account to create this vagrant instance or you copied the instance over to another system.

Solution

Edit ./vagrant/machines/default/virtualbox/creator_uid Set 504 to 502

Python3 File Management

This page shows how to work with text files using Python3 code. Following is a sample txt file we will be using. Lets called it stocks.txt

bce.to
hnd.to 
mtl.to

Reading from File

You can use read(), readline(), or readlines() function to read from a file. This example uses read()

stocks = open("data/stocks.txt","r")
stocks.read()
stocks.close()

output:

'bce.to\nhnd.to\nmtl.to'

Using readline():

Python3 Code Snippets

Get Current Date

import datetime as dt
now = dt.datetime.now()
print(now.year)
print(now.month)
print(now.day)

Download File from Internet

import urllib.request
url = 'http://molecularsciences.org'
response = urllib.request.urlopen(url)
mydata = response.read()
mytext = text.decode('utf-8')
print(mytext)

Hive

Hive is an SQL language that processes and analyzes data in Hadoop. It does not require knowledge of any programming language. Hive is not suitable for OLTP, it is designed for analyzing big data.

Hadoop in not a database, it is an ecosystem of tools that enables the features we require and desire when dealing with big data. Hadoop runs on HDFS and its native language is MapReduce. Hive converts your SQL commands to MapReduce. Hive also supports workflow integration with other tools such as Excel or Cognos.

Apache Spark

Spark a distributed computing platform that is built on top of Hadoop MapReduce. It extends the MapReduce model and make it easier to code and faster to execute.

Spark provides API in Java, Scala, and Python. Any of these languages can be used to create Spark applications.

Spark supports, Map, Reduce, SQL queries, streaming data, machine learning algorithms, and graph algorithms.

Spark stack contains:

SQL Joins

The join clause allows us to combine SQL tables. Tables can be combined in different ways. Think on inner join as an intersection of two tables and an outer join as a union of two tables. This table explains different types of joins.

We will be using the following two tables in the examples:

Planning data science project

When planning for a data science project, you need to spend to seek clarification about the project. The important questions are:

  • What are the goals of the project?
  • What resources are required for this project?
    • data
    • software
    • personnel
  • Are there any important deadlines?
  • Who are the stakeholders?

Once you have sufficiently gained clarity on your project:

NumPy

NumPy is a python library that makes it easy to perform mathematical and logical operations on arrays.

What is my numpy version

import numpy as np
print (np.__version__)

output

1.9.3

Creating numpy arrays

There are many ways to create numpy arrays

Pages