When using the CRISP-DM framework, the first step in the data mining process is to develop your business understanding. This stage of the process is about gaining knowledge of the business, the issues they face, opportunities for improvement, their objectives, their constraints and creating your project plan.
Determine business objectives
The first step of any project is understanding the goals and objectives from a strategic perspective.
To begin, you need to get a broad overview of the organization, it`s environment and it`s history. Interviews and conversations with stakeholders will give you a basic idea, for a more rigorous approach a SWOT analysis should provide a comprehensive picture.
The business objectives are the organization`s primary objective for the project, described in terms that are relevant to stakeholders. Sometimes organizations will start by defining their success criteria, or by proposing objectives that are really success criteria in disguise. In this case you will need to work backwards with them to place those success criteria within a larger context. Doing so will help identify other unspoken goals and help encourage innovative solutions to problems.
Business success criteria
Once the objectives have been set, you need to determine how to measure whether or not they have been met. Defining a set of success criteria at the outset of the project will provide measurable indicators of whether the solution meets the business objectives or not. Getting agreement over success criteria at the outset helps to define expectations and ensure that everyone is aligned.
Once the strategic direction of the project has been set and agreed upon, it’s time to focus on more tactical measures.
Inventory of resources
Before work begins you’ll want to identify the resources which are available to you to draw from. This could include personnel (based on their business knowledge or technical skills), data sets, hardware, and software / licenses. Understanding available resources will help inform the cost / benefit analysis and the initial project plans.
Requirements, assumptions and constraints
Here you are creating three lists, one for the requirements, one for assumptions, and one for known constraints. These are a helpful way of checking some of your more fundamental assumptions with the organization and ensuring that they are aware of your resource needs. Ensure that your lists cover: validity of data, project schedules, quality of output, technical constraints, security, and legal considerations.
Risks and contingencies
If you are aware at the outset of factors that will have a material impact on the project schedule, quality, or validity of results you will want to clearly identify those factors at the outset. If possible or applicable, you will also want to identify potential contingencies and remedial actions that can be taken if those risks manifest.
In order to reduce confusion and ensure clarity, prepare a list of business terminology and data terminology that can be understood by both the organization and the data scientists. This helps build background knowledge of the business and also reduces the chance of misunderstandings between technical and non-technical personnel.
Costs and benefits
In order to help decide whether the project is a worthwhile endeavour, compare the cost of the project to the benefit that the organization is expected to derive from it. Remember to consider both financial and non-financial costs and benefits in your analysis.
Determine data mining goals
You’ve already defined your success criteria in business terms, now you need to define your success criteria in technical terms.
Data mining goals
This is the goal in terms of a variable you are trying to measure or quantify, a prediction you are trying to make, or insights you are trying to glean. The data mining goals will be more specific and numerous than the business goals, since they support the achievement of the business goals. Ultimately the data mining goals are the outputs of the overall project.
Data mining success criteria
Since there might be multiple solutions at varying degrees of accuracy for a specific data problem, it is important to clearly define the success criteria. This should include the degree of predictive accuracy or fit which is required from a model, the error rate, and other subjective criteria. It’s important to note that these success criteria could be objective or subjective, but if they are subjective it is important to identify the stakeholders who will make the final decision about success.
Produce project plan
By this point in the planning process, you should have a good idea of the scope of the project and the degree of effort required. In order to ensure expectations are aligned, it’s important to codify and share this knowledge as a project plan.
The project plan should list the stages of the work, tools and resources required, the duration of each task, inputs, outputs and dependencies. Where dependencies are present, it’s important to clearly identify them and communicate them since they will present areas of greater risk in the project. Having some prepared mitigation strategies for important dependencies will ensure that you are ready for bad scenarios.
Unlike most construction and development projects, data mining is an iterative process by design. Since this can be a source of friction and misunderstanding between business and technical workers, it is important to clear explain and identify any iterations or repetitions that might be required in the project plan.
The project plan should be a living document which is regularly revisited, both to ensure compliance and also to identify, quantify and realign when issues arise during the execution of the project. Review the project plan at major milestones and incorporate reviews of the project plan into the timelines for the project.
Initial assessment of tools and techniques
Already you will probably have ideas about which tools and techniques to use based on your discussions and planning. At this point in the process you should perform a formal assessment of tools and techniques to determine which are the best fit for the task at hand. Not all tools are created equally, and the tools you chose might end up influencing your results, so it is important to dedicate time to careful evaluation.
Having gone through this process you will have made the journey from background research to project planning and have a clear idea of the work required and what your next steps are. Need help managing your data science project or improving your planning methodology? Get in touch with us using the contact form and learn how we can help.