Before you can cook, you must go to the grocery store, but you will also use items from your pantry. Data warehousing is the process of taking all those new groceries and organizing them in the context of your pantry, before you even know what you will cook. Once your ingredients are prepared in the data warehouse, you can begin to cook, or start your data mining. Together these two processes—data warehousing and data mining techniques—work together to create a warehouse of data and extract valuable insight from it. The trouble occurs when the step in between warehousing and mining is skipped, and analysts jump straight to processing the data.
With an incomplete, messy, or outdated pantry, you might not have the baking powder for perfect biscuits, and so it is with the relationship between data warehousing and data mining. A great cook needs a well-organized pantry, and a great data analyst needs well-organized data structured in a way that allows for efficient insight. Without it, like most analysts, they’ll spend 80% of their time organizing the pantry, instead of focusing on their cooking technique—data mining. The step of structuring and cleaning the data is crucial before analysts move onto data mining techniques. With a well-organized database or “pantry” following the data warehousing stage, analysts are better able to extract valuable information with data mining techniques.