Project: Data Generation and Active Learning in Machine Learning for Simulations

Edgar Torres, M.Sc.
Email | Personal Website
Project status: running

Problem Statement

The data labeling and generation process for both mathematical and data-driven models can be expensive and time-consuming in numerous application domains, such as manufacturing, science, and engineering.

For instance, in the material sciences, product quality testing involves exerting strong forces on the product, ultimately leading to its destruction.

Even advanced data generation techniques, such as simulations, demand significant computational resources, with some simulations taking weeks to complete, making them impractical for efficient data generation in real-time or resource-constrained settings.


This PhD project aims to address these challenges by developing novel machine learning methods for adaptive data generation and labeling.

Rather than conducting random and inefficient tests, the proposed methods will quantify the uncertainty of a given machine learning model and use these uncertainty measures to guide the data and label generation process.

When a model exhibits higher uncertainty within its input space, it indicates the need for additional labeled data to refine the model’s parameters. Instead of performing costly simulations for a multitude of randomly generated conditions, the machine learning methods will adopt a more structured and uncertainty-driven approach to data generation.

In addition, the integration of existing simulation techniques with machine learning to optimize the data requirements of machine learning models will be explored. This fusion of methodologies aims to optimize data usage and enhance model performance.

Furthermore, methods for leveraging previously trained machine learning models as building blocks for solving similar problems will be investigated.

Ultimately, these approaches aim to address challenges associated with scarce or expensive data generation, making these areas more accessible and cost-effective.