We aim to design and develop new methods to attack machine learning models and use the adversarial attacks to define a measure of reliability. Weak performances of models where data sets are not representative or flaws in training process are a common issue in Machine Learning. This leads to misclassification and unfairness of the model. We will develop a framework that identifies adversarial regions in the data space that are prone to make models fail. The framework will not only identify these regions and data, but also produce tools to improve it, and return a score that reflects the reliability of the model. This score can be used to certify models without having access to the training process and estimate the applicability of models to specific use cases.
Recommended skills: Basic knowledge of machine learning and python