Once the data preparation phase is complete, its time to move on to the fun part of the CRISP-DM framework: modeling. Here you’ll chose which modeling technique to use, create some tests to assess the accuracy of your model, build the model, and then assess the model using the tests you created.
After developing business understanding and data understanding, the next big objective in the CRISP-DM methodology is to prepare the data for modelling and analysis. This involves selecting, cleaning and transforming the data which will be used for the project. While this isn’t flashy work, it typically accounts for 60% to 80% of the effort for a project.
Corporate reporting is a prime candidate for automation if you can clearly explain the process to produce it, and the process remains consistent over time. Automating your reports has many potential benefits, it can save time, reduce errors, and alleviate the boredom caused by performing repetitive tasks.
Having developed business understanding and a deep knowledge of the problem you are trying to solve, the next step in the CRISP-DM framework is to develop that same level of understanding around the data itself. This step isn’t analysis, but rather looking at the structure and shape of the data in order to determine what information is available and how to go about building your analysis.
Buzz words have the unfortunate tendency to be often used but seldom clearly defined. Today we are going to tackle the popular phrase “big data” and strip it down to a clear definition. Overall the term is fairly self explanatory, it refers to large data sets, but there are 5 defining characteristics specific to big data which differentiate it from the data-sets of yesterday. These 5 characteristics are known as the 5 V’s of big data.
As big data transforms our businesses, governments and society, it also presents us with new moral and ethical dilemmas that we need to consider. As is typical with new technology, we often tend to implement first, and consider the ethical issues later. Cathy O’Neil’s book Weapons of Math Destruction is an introduction to the ethical issues raised by the widespread use of data to drive decisions in our lives.
When using the CRISP-DM framework, the first step in the data mining process is to develop your business understanding. This stage of the process is about gaining knowledge of the business, the issues they face, opportunities for improvement, their objectives, their constraints and creating your project plan.
Talking about the rate of change in our society has transcended being a statement of fact to being something of a cliché. Never the less, technical and societal changes are forcing us to regularly ask deep questions about how to move forward in the midst of rapid change. Joi Ito and Jeff Howe of the MIT Media Lab tackle these questions and propose new guiding principles in their book Whiplash.
While analysis tools and algorithms have evolved at a rapid pace, the overall business process for analytics has remained remarkably stable. One seminal work on the analytic process is IBM’s Cross-Industry Standard Process for Data Mining (CRISP-DM). At over 20 years old, it remains a relevant and useful tool for describing the overall data science workflow.