Looking Inside the Black-Box: Capturing Data Provenance using Dynamic Instrumentation

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Knowing the provenance of a data item helps in ascertaining its trustworthiness. Various approaches have been proposed to track or infer data provenance. However, these approaches either treat an executing program as a black-box, limiting the fidelity of the captured provenance, or require developers to modify the program to make it provenance-aware. In this paper, we introduce DataTracker, a new approach to capturing data provenance based on taint tracking, a technique widely used in the security and reverse engineering fields. Our system is able to identify data provenance relations through dynamic instrumentation of unmodified binaries, without requiring access to, or knowledge of, their source code. Hence, we can track provenance for a variety of well-known applications. Because DataTracker looks inside the executing program, it captures high-fidelity and accurate data provenance.
Original languageEnglish
Title of host publication5th International Provenance and Annotation Workshop (IPAW'14)
Pages155-167
DOIs
Publication statusPublished - 2014

Fingerprint

Reverse engineering

Cite this

Stamatogiannakis, M. ; Groth, P.T. ; Bos, H.J. / Looking Inside the Black-Box: Capturing Data Provenance using Dynamic Instrumentation. 5th International Provenance and Annotation Workshop (IPAW'14). 2014. pp. 155-167
@inproceedings{5c1ad4b4c36746458edf839da877dd84,
title = "Looking Inside the Black-Box: Capturing Data Provenance using Dynamic Instrumentation",
abstract = "Knowing the provenance of a data item helps in ascertaining its trustworthiness. Various approaches have been proposed to track or infer data provenance. However, these approaches either treat an executing program as a black-box, limiting the fidelity of the captured provenance, or require developers to modify the program to make it provenance-aware. In this paper, we introduce DataTracker, a new approach to capturing data provenance based on taint tracking, a technique widely used in the security and reverse engineering fields. Our system is able to identify data provenance relations through dynamic instrumentation of unmodified binaries, without requiring access to, or knowledge of, their source code. Hence, we can track provenance for a variety of well-known applications. Because DataTracker looks inside the executing program, it captures high-fidelity and accurate data provenance.",
author = "M. Stamatogiannakis and P.T. Groth and H.J. Bos",
year = "2014",
doi = "10.1007/978-3-319-16462-5_12",
language = "English",
isbn = "9783319164618",
pages = "155--167",
booktitle = "5th International Provenance and Annotation Workshop (IPAW'14)",

}

Looking Inside the Black-Box: Capturing Data Provenance using Dynamic Instrumentation. / Stamatogiannakis, M.; Groth, P.T.; Bos, H.J.

5th International Provenance and Annotation Workshop (IPAW'14). 2014. p. 155-167.

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Looking Inside the Black-Box: Capturing Data Provenance using Dynamic Instrumentation

AU - Stamatogiannakis, M.

AU - Groth, P.T.

AU - Bos, H.J.

PY - 2014

Y1 - 2014

N2 - Knowing the provenance of a data item helps in ascertaining its trustworthiness. Various approaches have been proposed to track or infer data provenance. However, these approaches either treat an executing program as a black-box, limiting the fidelity of the captured provenance, or require developers to modify the program to make it provenance-aware. In this paper, we introduce DataTracker, a new approach to capturing data provenance based on taint tracking, a technique widely used in the security and reverse engineering fields. Our system is able to identify data provenance relations through dynamic instrumentation of unmodified binaries, without requiring access to, or knowledge of, their source code. Hence, we can track provenance for a variety of well-known applications. Because DataTracker looks inside the executing program, it captures high-fidelity and accurate data provenance.

AB - Knowing the provenance of a data item helps in ascertaining its trustworthiness. Various approaches have been proposed to track or infer data provenance. However, these approaches either treat an executing program as a black-box, limiting the fidelity of the captured provenance, or require developers to modify the program to make it provenance-aware. In this paper, we introduce DataTracker, a new approach to capturing data provenance based on taint tracking, a technique widely used in the security and reverse engineering fields. Our system is able to identify data provenance relations through dynamic instrumentation of unmodified binaries, without requiring access to, or knowledge of, their source code. Hence, we can track provenance for a variety of well-known applications. Because DataTracker looks inside the executing program, it captures high-fidelity and accurate data provenance.

U2 - 10.1007/978-3-319-16462-5_12

DO - 10.1007/978-3-319-16462-5_12

M3 - Conference contribution

SN - 9783319164618

SP - 155

EP - 167

BT - 5th International Provenance and Annotation Workshop (IPAW'14)

ER -