Semi-automatic extraction of cross-table data from a set of spreadsheets

Alaaeddin Swidan*, Felienne Hermans

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Spreadsheets are widely used in companies. End-users often value the high degree of flexibility and freedom spreadsheets provide. However, these features lead to the development of a variety of data forms inside spreadsheets. A cross-table is one of these forms of data. A cross-table is defined as a rectangular form of data, which expresses the relations between a set of objects and a set of attributes. Cross-tables are common in spreadsheets: our exploratory analysis found that more than 3.42% of spreadsheets in an industrial open dataset include at least one cross-table. However, current software tools provide no support to analyze data in cross-tables. To address this, we presents a semi-automatic approach to extract cross-table data from a set of spreadsheets, and transform them to a relational table form. We evaluate our approach in a case study, on a set of 333 spreadsheets with 2,801 worksheets. The results show that the approach is successful in extracting over 92% of the data inside the targeted cross-tables. Further, we interview two users of the spreadsheets working in the company; they confirmed the approach is beneficial and provides correct results.

Original languageEnglish
Title of host publicationEnd-User Development - 6th International Symposium, IS-EUD 2017, Proceedings
EditorsFabio Paterno, Simone Stumpf, Stefano Valtolina, Simone Barbosa, Panos Markopoulos
PublisherSpringer Verlag
Pages84-99
Number of pages16
ISBN (Print)9783319587349
DOIs
Publication statusPublished - 1 Jan 2017
Externally publishedYes
Event6th International Symposium on End-User Development, IS-EUD 2017 - Eindhoven, Netherlands
Duration: 13 Jun 201715 Jun 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10303 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th International Symposium on End-User Development, IS-EUD 2017
Country/TerritoryNetherlands
CityEindhoven
Period13/06/1715/06/17

Fingerprint

Dive into the research topics of 'Semi-automatic extraction of cross-table data from a set of spreadsheets'. Together they form a unique fingerprint.

Cite this