Federated conditional generative adversarial nets imputation method for air quality missing data

Xu Zhou, Xiaofeng Liu*, Gongjin Lan, Jian Wu

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

The air quality is a topic of extreme concern that attracts a lot of attention in the world. Many intelligent air quality monitoring networks have been deployed in various places, especially in big cities. These monitoring networks collect air quality data with some missing data for some reasons which pose an obstacle for air quality publishing and studies. Generative adversarial nets (GAN) methods have achieved state-of-the-art performance in missing data imputation. GAN-based imputation method needs enough training data while one monitoring network has just a few and poor quality monitoring data and these data sets do not meet the independent identical distribution (IID) condition. Therefore, one monitoring network side needs to utilize more monitoring data from other sides as far as possible. However, in the real world, these air quality monitoring networks are owned by different organizations — companies, the government even some secret units. Many of them cannot share detailed monitoring data due to security, privacy, and industrial competition. In this paper, it is the first time to propose a conditional GAN imputation method under a federated learning framework to solve the data sets that come from diverse data-owners without sharing. Furthermore, we improve the vanilla conditional GAN performance with Wasserstein distance and “Hint mask” trick. The experimental results show that our GAN-based imputation methods can achieve the best performance. And our federated GAN imputation method outperforms the GAN imputation method trained locally for each participant which means our imputation model can work. Our proposed federated GAN method can benefit model quality by increasing access to air quality data through private multi-institutional collaborations. We further investigate the effects of data geographical distribution across collaborating participants on model quality and, interestingly, we find that the GAN training process with a federated learning framework performs more stable.

Original languageEnglish
Article number107261
Pages (from-to)1-12
Number of pages12
JournalKnowledge-Based Systems
Volume228
Early online date26 Jun 2021
DOIs
Publication statusPublished - 27 Sep 2021

Bibliographical note

Funding Information:
This work was supported in part by the National key R&D program2018AAA0100800, Key Research and Development Program of Jiangsu under grants BK20192004B and BE2018004, Guangdong Forestry Science and Technology Innovation Project under grant 2020KJCX005, International Cooperation and Exchanges of Changzhou under grant CZ20200035.

Funding Information:
This work was supported in part by the National key R&D program 2018AAA0100800 , Key Research and Development Program of Jiangsu under grants BK20192004B and BE2018004 , Guangdong Forestry Science and Technology Innovation Project under grant 2020KJCX005 , International Cooperation and Exchanges of Changzhou under grant CZ20200035 .

Publisher Copyright:
© 2021 Elsevier B.V.

Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.

Keywords

  • Air pollutants
  • Conditional GAN imputation
  • Federated learning
  • Privacy-preserving machine learning

Fingerprint

Dive into the research topics of 'Federated conditional generative adversarial nets imputation method for air quality missing data'. Together they form a unique fingerprint.

Cite this