Data science workflow

There are many methodologies used by data scientists. Regardless of the steps and terms used, most of them conform to the following:

  1. Plan your project
    • define goals
    • organize and coordinate resources
    • start your project
  2. Prepare data for analysis (iterative process)
    • acquire data
    • clean data
    • explore and refine data
  3. Model your problem (iterative process)
    • create model
    • validate model
    • evaluate model
    • refine model
  4. Wrap up
    • present your findings
    • revisit your model
    • archive and document

All of these steps will be discussed in detail.