Skills used in Data Science

Not all data scientist are alike. Some do general work while others focus on domains such as health, technology, or business. They can work in predictive analysis, big data, spatial data, or any combination of these. Based on their requirements they require different technical skills.

Technologies

Following some technology trends that I have observed.

Data Science

A chemist studies chemical properties of objects. A biologist studies living beings. A data scientist studies data. Data is real. It has real properties. Study of data leads to information and knowledge. Answering questions with data leads to revelation, understanding, and wisdom.

ArrayList NullPointerException

Java's ArrayList is a great Collection that frees us from IndexOutOfBound, one type limitation, and NullPointerException issues while providing several value-added methods. Naturally, you would be shocked if you encounter a NullPointerException on an ArrayList. The only time, you would encounter this problem is when you fail to initialize the ArrayList.

Copying a large database from one server to another

Recently, I had to copy a large database from one server to another. I had 100G free on the source server. Unfortunately, the server would run out of memory before the mysqldump would complete. So I had to find a way to make the mysqldumps smaller. This can be done by creating separate dumps for each table. Then I had to compress the files to conserve memory. Following is the script I wrote to accomplish this.

split command

In Linux you can use split and join commands to split large files into smaller files or join many smaller files into a large file. This kind of operations are often necessary when you are dealing with large quantities of data.

split

Following is the default functionality of split. It splits a large file every thousand lines and creates new files.

$ split largefile.txt

$ ls
largefile.txt  xaa xab  xac  xad

$ wc -l *
3285  largefile.txt
1000  xaa
1000  xab
1000  xac

You can also define the number of lines you want in each file

Listing all tables and their table counts in a MySQL database

The Following SQL query will list all tables in a MySQL database and also list the row counts for each.

SELECT TABLE_NAME, TABLE_ROWS 
FROM `information_schema`.`tables` 
WHERE `table_schema` = 'mydatabase';

where mydatabase is the name or your database. The output will like something like the following:

+------------+------------+
| table_name | table_rows |
+------------+------------+
| lines      |       2271 |
| links      |        484 |
| word       |      25004 |
+------------+------------+

Pages