TY - GEN
T1 - Semi-automatic extraction of cross-table data from a set of spreadsheets
AU - Swidan, Alaaeddin
AU - Hermans, Felienne
PY - 2017/1/1
Y1 - 2017/1/1
N2 - Spreadsheets are widely used in companies. End-users often value the high degree of flexibility and freedom spreadsheets provide. However, these features lead to the development of a variety of data forms inside spreadsheets. A cross-table is one of these forms of data. A cross-table is defined as a rectangular form of data, which expresses the relations between a set of objects and a set of attributes. Cross-tables are common in spreadsheets: our exploratory analysis found that more than 3.42% of spreadsheets in an industrial open dataset include at least one cross-table. However, current software tools provide no support to analyze data in cross-tables. To address this, we presents a semi-automatic approach to extract cross-table data from a set of spreadsheets, and transform them to a relational table form. We evaluate our approach in a case study, on a set of 333 spreadsheets with 2,801 worksheets. The results show that the approach is successful in extracting over 92% of the data inside the targeted cross-tables. Further, we interview two users of the spreadsheets working in the company; they confirmed the approach is beneficial and provides correct results.
AB - Spreadsheets are widely used in companies. End-users often value the high degree of flexibility and freedom spreadsheets provide. However, these features lead to the development of a variety of data forms inside spreadsheets. A cross-table is one of these forms of data. A cross-table is defined as a rectangular form of data, which expresses the relations between a set of objects and a set of attributes. Cross-tables are common in spreadsheets: our exploratory analysis found that more than 3.42% of spreadsheets in an industrial open dataset include at least one cross-table. However, current software tools provide no support to analyze data in cross-tables. To address this, we presents a semi-automatic approach to extract cross-table data from a set of spreadsheets, and transform them to a relational table form. We evaluate our approach in a case study, on a set of 333 spreadsheets with 2,801 worksheets. The results show that the approach is successful in extracting over 92% of the data inside the targeted cross-tables. Further, we interview two users of the spreadsheets working in the company; they confirmed the approach is beneficial and provides correct results.
UR - http://www.scopus.com/inward/record.url?scp=85021197876&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85021197876&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-58735-6_6
DO - 10.1007/978-3-319-58735-6_6
M3 - Conference contribution
AN - SCOPUS:85021197876
SN - 9783319587349
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 84
EP - 99
BT - End-User Development - 6th International Symposium, IS-EUD 2017, Proceedings
A2 - Paterno, Fabio
A2 - Stumpf, Simone
A2 - Valtolina, Stefano
A2 - Barbosa, Simone
A2 - Markopoulos, Panos
PB - Springer Verlag
T2 - 6th International Symposium on End-User Development, IS-EUD 2017
Y2 - 13 June 2017 through 15 June 2017
ER -