Data Science

A chemist studies chemical properties of objects. A biologist studies living beings. A data scientist studies data. Data is real. It has real properties. Study of data leads to information and knowledge. Answering questions with data leads to revelation, understanding, and wisdom.

Data Science is the science of extracting knowledge from data and presenting the findings in an audience-friendly format. Governments and corporations are sitting on massive quantities of data. Hidden in this data is information that can revolutionize the way governments and corporations serve their clients and earn revenue. Hidden there is information that is of great scientific interest. For businesses, hidden in data is knowledge that can give a business the competitive advantage it needs.

Data science involves application or Mathematics (Linear Algebra, Calculus, Graph Theory, Statistics), expert technical skills, and the ability to ask insightful questions. A data scientist does not need to be an expert in everything but the more he/she knows the better. The most important characteristic of a data scientist is curiosity and ability to ask insightful questions. Without this ability, technical and math skills cannot be used resourcefully.

Why now

There is suddenly a high demand for data scientists. Why is there a new field called Data Science and why is it in such high demand? Following are some of the more important reasons:

  1. Storage is very cheap and it is getting cheaper
  2. Software for data science is improving and getting cheaper
  3. Almost all medium to large sized organizations have generated large collections of data and there is just massive amounts of data available online.
  4. There are many algorithms for data science and the number is growing
  5. The market is finally beginning to appreciate the value of data science to improve service and profitability

In terms of technology, the cloud has been the game changer. All of a sudden, everyone can access unlimited quantities of storage space and processing power at a very little cost. In addition, most providers such as Amazon provide numerous value-added tools to facilitate data science work. A cloud allows data scientists to easily bypass their local technical limitations.

Profile of a successful data scientist

A successful data scientist is curious and passionate about his data. Data scientists make assumptions about data, learn from the data and then modify their hypothesis over several iterations until they reach conclusion strongly supported by their data. They are comfortable with analytics platforms, programming, and mathematics. Once data scientists complete their study, they present their findings in easy to understand narratives.

Not all data scientist are alike. Some do general work while others focus on domains such as health, technology, or business. They can work in predictive analysis, big data, spatial data, or any combination of these. Based on their requirements they require different technical skills. Following some trends that I have observed:

  • global skills: SQL
  • unstructured data: python is better than R
  • structured data: R or python
  • predictive analysis: R or python
  • big data: Hadoop, Stats, SPSS, PIG, Spark
  • spatial data: maptitude, mapinfo
  • data visualization: tableau, JavaScript D3, Qlik

Depending on what the data and analysis requires different mathematical tools are required. Regression is used frequently.

When you good assumptions, many tools are available to test those assumptions. When you don't have good assumptions and just need to explore the data, use machine learning.

R and python are preferred skills for predictive analysis. python is better suited for working with unstructured data. Expert knowledge of SQL is required for most work. Hadoop, Cassandra, and Spark are often used for big data.

Career prospects

What are the career prospects for someone going into data science. There is a high demand for data scientists as more and more organizations are realizing the benefits of data science. Therefore, this is an urgent need for data scientists. Currently, the average salary of a data scientist is higher than programmers or mathematicians.