Incremental Map-Reduce on Repository History

Johannes Hartel, Ralf Lammel

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review


Work on Mining Software Repositories typically involves processing abstractions of resources on individual revisions. A corresponding processing of abstractions of resource changes often depends on working with all revisions of the repository history to guarantee a high resolution of the measured changes. Abstractions of resources and abstractions of resource changes are often very related up to the point that they can be used interchangeably in the processing. In practice, approaches working with abstractions processed over high revision counts face a scalability challenge. In this work, we contribute to the challenge by incrementalizing the processing of repository resources and the corresponding abstractions. Our work is inspired by incrementalization theory including insights on Abelian groups, group homomorphisms and indexing. We provide a map-reduce interface that enables calls to foreign functionality and convenient operations for processing abstractions, such as mapping, filtering, group-wise aggregation and joining. Apache Spark is used for distribution. We compare the scalability of our approach with available MSR approaches, i.e., with LISA that reduces redundancy and with DJ-Rex that migrates an analysis to a distributed map-reduce framework.
Original languageEnglish
Title of host publicationSANER 2020 - Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering
EditorsK. Kontogiannis, F. Khomh, A. Chatzigeorgiou, M.-E. Fokaefs, M. Zhou
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728151434
Publication statusPublished - 1 Feb 2020
Externally publishedYes
Event27th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2020 - London, Canada
Duration: 18 Feb 202021 Feb 2020


Conference27th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2020


Dive into the research topics of 'Incremental Map-Reduce on Repository History'. Together they form a unique fingerprint.

Cite this