Why AI & machine learning need good data
There is an endless conversation taking place right now about artificial (AI) and machine learning. These have rightly been called the technologies of the future, and they will continue to innovate and disrupt for decades to come. What often gets lost in this enthusiasm, however, is the importance of good data.
Think of AI and machine learning like boats and data like water. If you don’t have any water, you don’t need a boat. If you only have a little bit of water, you can probably just swim. Finally, if the water is polluted, you probably want to avoid it no matter what kind of boat you have. The point is that AI and machine learning are not assets on their own. Their value is based entirely on the data that they’re built (or floating) on.
This insight is crucial, yet it’s often overlooked or discounted because cleaning up data is difficult. Regardless of whether that’s true, good data is essential for AI and machine learning to work. Without the cleanest and most complete set of data possible, data-driven insights are never reliable. Follow these steps to ensure yours continues to be an asset:
- Specify Your Objectives – Define in the clearest terms possible why you want to use AI/machine learning and what you hope to achieve. Once your priorities are clear, it’s much easier to define the data and data quality your agenda requires.
- Plan for Data Improvements – Cleaning up your data is going to take time and labor. There is no way around it. Acknowledge this fact, then build the data improvement process into your plans and timeline. It may set your agenda back by weeks or months, but it ensures your effort is not wasted.
- Record Your Efforts – Analytics is a systematic effort, and data management must follow suit. In order to avoid gaps/crack or duplication/redundancy in your efforts, keep a careful audit trail of your actions. The goal is to ensure that all data in all locations receive all the attention it requires.
- Assign a Manager – It’s easy for data management to fall down the to-do list. Task someone with monitoring and managing the effort so that it stays on track. Ask this person to create timelines/benchmarks, then report regularly about progress or setbacks.
- Get Independent Assurance – Institutional bias can creep into the data management process without warning. As a result, data that might seem “high-quality” could be exactly the opposite. The final step in any improvement initiative is to have the data independently assessed. It’s easy (and tempting) to skip this step, but it’s risky as well.
Data management is an ongoing effort. The good news is that an early investment pays long-term dividends. The sooner that companies get their data in order, the sooner they can roll out AI and machine learning. More importantly, since they took the time to improve their data upfront, new technologies pay off in less time. In this case, responsibility and restraint are the sound strategies.