The remarkable capabilities of deep neural networks (DNNs) in addressing intricate problems are accompanied by a notable environmental toll. Training these networks demands immense energy consumption, owing to the vast volumes of data needed, the sizeable models employed, and the prolonged training durations. Compounded by the principles of Green-AI, which emphasize reducing the ecological footprint of AI technologies, this poses a pressing concern. In response, we introduce DFSMT, an approach tailored to selecting a subset of labeled data for training, thereby aligning with Green-AI objectives. Our methodology leverages Active Learning (AL) techniques, which systematically identify and select batches of the most informative instances of the data for model training. Through an iterative application of diverse AL strategies, we curate a labeled data subset that preserves adequate information to maintain model quality standards. Empirical results underscore the effectiveness of our approach, demonstrating substantial reductions in labeled data requirements without significantly compromising model performance. This achievement carries particular significance in the context of Green-AI, providing a pathway to mitigate the environmental impact of AI training processes.

Data Filtering for a Sustainable Model Training

Scala F.
;
Pontieri L.
2024

Abstract

The remarkable capabilities of deep neural networks (DNNs) in addressing intricate problems are accompanied by a notable environmental toll. Training these networks demands immense energy consumption, owing to the vast volumes of data needed, the sizeable models employed, and the prolonged training durations. Compounded by the principles of Green-AI, which emphasize reducing the ecological footprint of AI technologies, this poses a pressing concern. In response, we introduce DFSMT, an approach tailored to selecting a subset of labeled data for training, thereby aligning with Green-AI objectives. Our methodology leverages Active Learning (AL) techniques, which systematically identify and select batches of the most informative instances of the data for model training. Through an iterative application of diverse AL strategies, we curate a labeled data subset that preserves adequate information to maintain model quality standards. Empirical results underscore the effectiveness of our approach, demonstrating substantial reductions in labeled data requirements without significantly compromising model performance. This achievement carries particular significance in the context of Green-AI, providing a pathway to mitigate the environmental impact of AI training processes.
2024
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Active Learning
Data Selection
Energy Efficiency
Green-AI
Sustainability
File in questo prodotto:
File Dimensione Formato  
paper26.pdf

accesso aperto

Licenza: Creative commons
Dimensione 1.39 MB
Formato Adobe PDF
1.39 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/532247
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact