Java's split function packs a powerful punch. You can use it to split string on characters, symbols, substrings, a collection of symbols, or even regular expressions.
Not all data scientist are alike. Some do general work while others focus on domains such as health, technology, or business. They can work in predictive analysis, big data, spatial data, or any combination of these. Based on their requirements they require different technical skills.
Following some technology trends that I have observed.
Java's replaceString function is very handy for string and substring modifications.
A chemist studies chemical properties of objects. A biologist studies living beings. A data scientist studies data. Data is real. It has real properties. Study of data leads to information and knowledge. Answering questions with data leads to revelation, understanding, and wisdom.
Java allows programmers to create their own Exceptions. To create and exception, you should inherit from the exception closest to what you wish to create. Following is a generic example to show how to create and use a custom exception.
Java's ArrayList is a great Collection that frees us from IndexOutOfBound, one type limitation, and NullPointerException issues while providing several value-added methods. Naturally, you would be shocked if you encounter a NullPointerException on an ArrayList. The only time, you would encounter this problem is when you fail to initialize the ArrayList.
The following code reads files in a directory called SQL. Then if runs an SQL command with the file name as input. Finally, it deletes the file from the directory.
#!/bin/bash for file in sql/* do echo "Processing $file\n" mysql -u root -pdonut node2 < $file rm -f $file done
Recently, I had to copy a large database from one server to another. I had 100G free on the source server. Unfortunately, the server would run out of memory before the mysqldump would complete. So I had to find a way to make the mysqldumps smaller. This can be done by creating separate dumps for each table. Then I had to compress the files to conserve memory. Following is the script I wrote to accomplish this.
In Linux you can use split and join commands to split large files into smaller files or join many smaller files into a large file. This kind of operations are often necessary when you are dealing with large quantities of data.
Following is the default functionality of split. It splits a large file every thousand lines and creates new files.
$ split largefile.txt $ ls largefile.txt xaa xab xac xad $ wc -l * 3285 largefile.txt 1000 xaa 1000 xab 1000 xac
You can also define the number of lines you want in each file
The Following SQL query will list all tables in a MySQL database and also list the row counts for each.
SELECT TABLE_NAME, TABLE_ROWS FROM `information_schema`.`tables` WHERE `table_schema` = 'mydatabase';
where mydatabase is the name or your database. The output will like something like the following:
+------------+------------+ | table_name | table_rows | +------------+------------+ | lines | 2271 | | links | 484 | | word | 25004 | +------------+------------+