When to consider machine learning

For several applications, the idea of investigating ML methods should come from domain experts. However, it may not be easy to identify these opportunities without background knowledge in ML. Therefore, we have made a short list of questions we use in NGI to assess the ML potential in the project before discussing the project with people with ML competency.

Indicative questions	Description
Does the project include large quantities of labelled data?	ML is highly effective for managing and analysing large datasets. "Large" is relative and depends on the project context. This could mean many data points from a single source, multiple data sources, or continuously growing datasets from ongoing data collection efforts.
Does your project involve repetitive data analysis or interpretation tasks?	ML is particularly useful for automating repetitive, data-driven analyses, especially those involving pattern recognition and interpretation. It reduces the time spent on manual data reviews and trend identification. Applications include analysing drilling data, material property assessments, and geotechnical site characterisations.
Could leveraging automated decisions support the consistency and productivity of your project?	In many projects, repeated decisions are made by individual experts, leading to variability and potential inefficiencies. ML can act as a decision-support tool by systematically analysing historical data to provide consistent recommendations. While ML does not replace expert judgment, it helps standardise decision-making processes in areas such as environmental and geological characterisation and construction project planning.

Although we view ML as a powerful tool with a vast range of applications, evaluating whether it is the right tool for the job is essential. As with any capable emerging technology, the information about the limitations may not be as far-reaching as the success stories and hype. Therefore, people may be tempted to use ML for problems they shouldn’t. We advise using the following criteria to evaluate potential ML usage to mitigate this.

The task should address a well-defined existing problem
There should be sufficient data both in terms of quality and amount
Based on the preceding points, the chosen model should be suited to the problem

These criteria should be applied in sequence, and all should pass for any venture into ML.