LastPyMile: Identifying the discrepancy between sources and packages

Duc Ly Vu, Fabio Massacci, Ivan Pashchenko, Henrik Plate, Antonino Sabetta

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Open source packages have source code available on repositories for inspection (e.g. on GitHub) but developers use pre-built packages directly from the package repositories (such as npm for JavaScript, PyPI for Python, or RubyGems for Ruby). Such convenient practice assumes that there are no discrepancies between source code and packages. These differences pose both operational risks (e.g. making dependent projects unable to compile) and security risks (e.g. deploying malicious code during package installation) in the software supply chain. Our empirical assessment of 2438 popular packages in PyPI with an analysis of around 10M lines of code shows several differences in the wild: modifications cannot be just attributed to malicious injections. Yet, scanning again all and whole most likely good but modified' packages is hard to manage for FOSS downstream users. We propose a methodology, LastPyMile, for identifying the differences between build artifacts of software packages and the respective source code repository. We show how it can be used to extend current package scanning practices for malware injection (which only covers less than 1% of the code of deployed packages).

Original languageEnglish
Title of host publicationESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsDiomidis Spinellis
PublisherAssociation for Computing Machinery, Inc
Pages780-792
Number of pages13
ISBN (Electronic)9781450385626
DOIs
Publication statusPublished - Aug 2021
Event29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 - Virtual, Online, Greece
Duration: 23 Aug 202128 Aug 2021

Conference

Conference29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Country/TerritoryGreece
CityVirtual, Online
Period23/08/2128/08/21

Bibliographical note

Funding Information:
This research has been partly funded by the EU H2020 Programs H2020-EU.2.1.1-CyberSec4Europe (Grant No. 830929), AssureMoss (Grant No. 952647) and SPARTA project (Grant No. 830892).

Publisher Copyright:
© 2021 Owner/Author.

Keywords

  • Open source software
  • PyPI
  • Python
  • software supply chain

Fingerprint

Dive into the research topics of 'LastPyMile: Identifying the discrepancy between sources and packages'. Together they form a unique fingerprint.

Cite this