List.MID: A MIDI-Based Benchmark for Evaluating RDF Lists

. Linked lists represent a countable number of ordered values , and are among the most important abstract data types in computer science. With the advent of RDF as a highly expressive knowledge representation language for the Web, various implementations for RDF lists have been proposed. Yet, there is no benchmark so far dedicated to evaluate the performance of triple stores and SPARQL query engines on dealing with ordered linked data. Moreover, essential tasks for evaluating RDF lists, like generating datasets containing RDF lists of various sizes, or generating the same RDF list using diﬀerent modelling choices, are cumbersome and unprincipled. In this paper, we propose List.MID , a systematic benchmark for evaluating systems serving RDF lists. List.MID consists of a dataset generator, which creates RDF list data in various models and of diﬀerent sizes; and a set of SPARQL queries. The RDF list data is coherently generated from a large, community-curated base collection of Web MIDI ﬁles, rich in lists of musical events of arbitrary length. We describe the List.MID benchmark, and discuss its impact and adoption, reusability, design, and availability.


Introduction
Linked lists are data structures that represent a countable number of ordered values, and are one of the fundamental abstract data types in computer science [15].They are at least basically supported, with a variety of implementations, in the core libraries of all major programming languages [20].
With the advent of the Semantic Web [4], the Resource Description Framework [23] (RDF) becomes the standard for knowledge representation on the Web.As an expressive data format designed for enabling semantic interoperability, data integration, and data modeling in all sorts of domains, many use cases demand standard ways of representing classic data structures; linked lists are among them.Consequently, Semantic Web standards such as RDF itself [23], RDF Schema [6], and more recently JSON-LD [24] propose various implementations for RDF lists: rdf:Seq, based on list ordering properties; rdf:List, based on LISP-like rdf:first and rdf:rest pointers; or the "@list": [] JSON-LD attribute.Moreover, the community itself has developed its own ontology design patterns [10] to implement list-like ontological structures.
With this variety of alternatives, many questions arise on practical and performance issues with respect to RDF lists.For example, it is hard to choose one such implementation in large-scale, list-based RDF datasets [18] without knowing the impact of such choice in query performance.Differently, other users may be interested in favoring list readability over performance.In order to address this, some remarkable users have reported ways to query such RDF lists. 1 However, no standard benchmark has been so far proposed in the Semantic Web in order to generate RDF list data, in all its possible modeling alternatives, in a systematic and principled way.Such a benchmark could contribute to clarify many of the open questions about RDF list modeling and publishing on the Web, such as query performance, list readability, triplestore reproducible evaluations, and so forth.
In this paper, we introduce the List.MID benchmark, an RDF list data generator and query template set specifically designed for the evaluation of RDF lists.The benchmark has two focus points: (a) to cover as many RDF list implementations as possible, following a systematic study that surveys and summarizes different RDF list modeling practices into 6 different RDF list modeling templates [8]; and (b) to create such multi-model RDF lists out of real-world data, through the large-scale, list-rich symbolic music notation dataset of the MIDI Linked Data cloud [18].Specifically, the contributions of the paper are: -We list and describe 6 abstract RDF list modeling patterns recently surveyed [8] (Sect.3.1) -We describe the List.MID data generator (Sect.3.2), which generates RDF list data according to these patterns from the MIDI Linked Data cloud dataset [18]; and a set of SPARQL query templates for retrieval (Sect.3.3) -We show evidence of use and potential adoption for our proposed benchmark (Sect.4) The rest of the paper is organized as follows: Sect. 2 covers the related work; Sect. 3 describes the List.MID benchmark, data generator, and queries; Sect. 4 shows evidence of use and potential adoption for the benchmark; and Sect. 5 draws our conclusions.

Related Work
Multiple ways of modelling RDF lists have been proposed.The RDF Schema (RDFS) recommendation [6] defines several container classes to represent collections: rdf:Bag to contain unordered elements; rdf:Alt for "alternative" containers whose typical processing will be to select one of its members; and rdf:Seq to contain elements ordered by the numerical order of the container membership properties.[6] also defines a collection vocabulary to describe closed collection that can have no more members, through the class rdf:List and the properties rdf:first, rdf:rest, and rdf:nil.In the more recent JSON-LD [24], ordered lists like "@list": [ "bob", "alice", "carol" ] have equivalent representations as rdf:List.Similarly, the RDF 1.1 Turtle [2] syntax allows for the specification of rdf:List instances, e.g.:a :b ( "bob" "alice" "carol").Besides W3C standards, various ontology design patterns [10], like the Sequence Ontology Pattern2 (SOP), address the task of representing RDF lists.About relevant previous work on benchmarks, the Semantic Web community has developed a number of them for evaluating the performance of SPARQL engines.The Berlin SPARQL Benchmark (BSBM) [5] generates benchmark data about exploring products and analyzing consumer reviews.The Lehigh University Benchmark (LUBM) [13] does so on data about universities, departments, professors and students.SP 2 Bench [22] enables comparison of SPARQL optimization strategies, an estimation of their generality, and the prediction of their benefits in real-world scenarios; it includes a benchmark data generator based on the DBLP bibliographic database [16].Similarly, the DBpedia SPARQL benchmark [19] proposes human-written queries that execute against non-relational schemas.The Waterloo SPARQL Diversity Test Suite (WatDiv) focuses on measuring "how an RDF data management system performs across a wide spectrum of SPARQL queries with varying structural characteristics and selectivity classes" [1].Other approaches like Linked SPARQL queries (LSQ) [21] focus specifically on benchmark queries from SPARQL query logs, but typically do not generate data to run these queries on.More recently, frameworks to integrate and compare various benchmarks, such as IGUANA [7] 3 , have emerged.Other, more pragmatic approaches propose ad-hoc benchmarks supporting specific applications [25] or SPARQL features, like federation [12].To the best of our knowledge, none of these benchmarks address specifically the evaluation of RDF lists.

The List.MID Benchmark
In this Section we describe the List.MID benchmark.First, we summarize the various modeling alternatives for lists in RDF (Sect.3.1); for a complete survey, see [8]).Second, we implement these modeling alternatives in a benchmark data generator that creates RDF datasets rich in lists from a large MIDI data collection (Sect.3.2).Finally, we propose a set of SPARQL queries to retrieve RDF list data according to the different modeling alternatives (Sect.3.3).
All the List.MID benchmark resources are available online in a GitHub repository at https://github.com/midi-ld/List.MID.The benchmark is licensed under The open availability of the benchmark in these platforms allows for fast and frictionless contributions from other parties.All relevant URLs and canonical citation are shown in Table 1.

Modeling Lists in RDF
There are various models for representing a sequence, a finite collection of ordered elements, in RDF.In this section we offer a summary of such models and their properties, recalling the research in [8].These models were surveyed by selecting them from the following sources, including W3C standards5 ontology design patterns [10], resource track papers in ISWC (e.g.[3,18]), and lookups of relevant terms in Linked Open Vocabularies [28].For a further detail and a description of the surveying methodology, see [8].
RDF Sequences.The RDF Schema (RDFS) recommendation [6] defines the container classes rdf:Bag, rdf:Alt, rdf:Seq to represent collections.Since rdf:Bag is intended for unordered elements, and rdf:Alt for "alternative" containers whose typical processing will be to select one of its members, these two models do not fit our sequence definition, and thus we do not include them among our candidates.Conversely, we do consider RDF Sequences: collections represented by rdf:Seq and ordered by the properties rdf: 1, rdf: 2, rdf: 3, ... instances of the class rdfs:ContainerMembershipProperty (see Fig. 1).
Properties.RDF Sequences indicate membership through various properties, which are used in triples in predicate position.Ordering of elements is absolute in such predicates through an integer index after an underscore (" ").Properties.RDF Lists indicate membership through the use of a unique property rdf:first in predicate position.Ordering of elements is relative to the use of the rdf:rest property, and given by the sequential forward traversal of the list.
Properties.URI-based lists indicate membership through the use of class membership or through properties.Order is absolute and given by URI-embedded sequential identifiers.midi:hasEvent <http://purl.org/midi-ld/piece/8cf9897/track00/event0006>indicates that the object belongs to a list of events; and the additional triple <http://purl.org/midi-ld/piece/8cf9897/track00/event0006>midi:absoluteTick 6 indicates that the event has index 6 (see Fig. 4).
Properties.Number-based lists indicate membership through the use of class membership or through properties.Order is absolute and given by an integer index in a literal as an object of an additional property.
Timestamp-Based Lists.Similarly to Number-based lists, other lists modeled by e.g. the Simple Event Model (SEM) [27], use timestamp markers instead of integer indexes to indicate the time in which the element of the list occurs.This is particularly useful in event-based applications, in which order clashes in the list are of lesser importance, as long as the timestamp order is preserved.For instance, the triple <http://purl.org/midi-ld/piece/8cf9897535d79e68c33a3076aa06d073/track00/event0006> midi:absoluteTime 0e+00 indicates that the 7th event occurs at the start of the list, possibly simultaneously with other events (see Fig. 5).

Sequence Ontology Pattern.
A number of models use RDF, RDFS and OWL to model sequences in domain specific ways.For example, the Time Ontology [14] and the Timeline Ontology7 offer a number of classes and properties to model temporality and order, including timestamps (see Sect. 3.1), but importantly also before/after relations.The Sequence Ontology Pattern8 (SOP) is an ontology design pattern [10] that "represents the 'path' cognitive schema, which underlies many different conceptualizations: spatial paths, time lines, event sequences, organizational hierarchies, graph paths, etc.".We select SOP as an abstract model representing this group of list models (see Fig. 6).
Properties.SOP lists indicate list membership through properties.Order is relative and given by the sequential forward or backward traversal of the sequence.

Data Generator
The first component of the List.MID benchmark is an algorithm to generate RDF datasets with lists according to the modeling patterns discussed above.The source code and all documentation are available on GitHub at https:// github.com/midi-ld/List.MID.
In order to root our benchmark within real-world data, we propose to generate data using MIDI files [26], a symbolic music encoding, as a basis.The reason for this is that MIDI files, and symbolic music notations in general, must encode musical events (the start of a note, the end of a note, the switching of one instrument for another, etc.) in strict sequential order to preserve musical coherence.Consequently, we use the midi2rdf algorithm proposed in [17] to generate RDF graphs from MIDI files; and we extend this algorithm here in order to encode RDF lists of musical events supporting the list data models discussed in Sect.3.1.
Figure 7 shows an excerpt of the MIDI ontology used by the original midi2rdf algorithm.The relevant elements here are midi:Track, each containing a sequence of related musical events (e.g.notes played by one single instrument); and midi:Event, each representing a musical event that happens in a strict order within the track (e.g. the start of a note, the end of a note).For more details on MIDI event encoding see [17,18,26].
The original midi2rdf algorithm generates implicit lists of events by encoding their order in the URI of the event (e.g.ex:track00/event02 happens immediately before ex:track00/event01 and immediately after ex:track00/event01), and hence adhering to the URI-based Lists pattern discussed in Sect.3.1.We extend this generation to the remaining patterns.
Usage.The first step is to find a MIDI file with the desired list size.The MIDI Linked Data cloud API9 incorporates a query10 to retrieve all track sizes in number of events in descending order from the dataset [18].Since this query is expensive, we include a resulting dump in the benchmark.An inspection of this result allows users to select a MIDI identifier of the chosen size; this identifier can be used in a second query 11 to download an RDF dump for the MIDI file.This dump can be transformed into an input MIDI file with the included rdf2midi command [17].
Once the chosen input MIDI file has been generated, the midi2rdf CLI tool of the List.MID benchmark can be used to generate its RDF graph according to the requested list pattern.The syntax is: The relevant introduced argument is order, which lets the user select the RDF list modeling to use for data generation.The mapping for the values of this argument with the patterns of Sect.3.1 is: RDF Sequences → seq, RDF Lists → list, URI-based Lists → uri, Number-based Lists → prop number, Timestamp-based Lists → prop time, Sequence Ontology Pattern → sop.For example, to generate benchmark data of a preselected http://purl.org/midild/pattern/bc7d9c25f81a4d90c000c30b6efc887dMIDI with 16,638 list elements using the RDF List pattern, we do: midi2rdf --format turtle --order list bc7d9c25f81a4d90c000c30b6efc887d.midbenchmark.ttl The output benchmark.ttlfile is ready to be used in a standard compliant RDF store.As shown in the syntax above, the benchmark is agnostic with respect to serialization formats, and the most frequent (including JSON-LD) are supported.

Queries
In this section we propose a set of SPARQL query templates for retrieval of elements of lists, according to the patterns described in Sect.3.1.Since the full coverage of list operations in SPARQL is cumbersome, here we restrict ourselves to typical data publishing functionality.Therefore, we consider minimal and atomic read operations; and we do not consider management operations (edit, merge, split of lists, etc.).The implementation of management operations is possible, but depend on implementations of read operations; thus, we focus here on read operations, and leave management operations for future work.
Therefore, the currently supported operations in List.MID consist of (a) orderly retrieve all elements of the list; and (b) access the n-th element of the list.
In order to systematically do this in datasets following one of the RDF list modeling patterns (Sect.3.1), we include corresponding SPARQL query templates in the benchmark.The queries can be found online in the GitHub repository of the benchmark, 12 and are summarized in Table 2.

Experiments and Reuse
In this Section we discuss current use and potential for reuse of our proposed benchmark in research.

First Experiment
The List.MID benchmark has been used in a first Semantic Web research experiment [8].The purpose of this work is to understand the impact of different RDF list modeling patterns (see Sect. 3.1) in the performance and availability of sequential retrieval of Linked Data.This crucially includes basic list operations such as orderly getting all elements of the list; randomly accessing one element of the list; and randomly accessing a sublist contained in a list.The most important findings quantify the impact of different list modeling choices in retrieval; and show that this impact is triplestore-invariant to a great degree.For a full report on such experiments, see [8].These experiments demonstrate the applicability and usefulness of the benchmark, and can be easily reproduced with List.MID and the supplementary materials at https://www.dropbox.com/sh/m98115y7ah2nqcv/AAAxkGsWuiPaLf6X7c uM0yWa.

Online Survey
Since the List.MID benchmark is a new resource for the Semantic Web community, we discuss here evidence for potential adoption.To gather such evidence, we perform an online survey in which we directly ask the community of potential adopters 8 questions regarding their background, relevance, and interest in benchmarking RDF lists.The online survey was distributed in the semantic-web and public-lod public mailing lists of the W3C; and in the internal mailing lists of the affiliation labs of the authors.In total we gathered N = 24 responses.The survey can be found online. 13Fig. 8 shows the results.Except for question 3 (Fig. 8c), all questions ask the respondents to quantify the agreement with the statement made from 1 (absolutely disagree) to 5 (absolutely agree), being 3 a neutral response (no agree nor disagree).In the first two questions (Fig. 8a, 8b) we assess the background of the respondents, finding that 75% of them have experience in modeling and publishing RDF, and 54.2% have experience or interest in RDF benchmarking; and thus proving adequacy of the population sample.Among the various RDF list modeling practices (Fig. 8c), rdf:List is the most popular, known by 2/3 of the respondents.Other practices like rdf:Seq (37.5%), implicit RDF elements as proxies (URIs, properties, etc.; 25%) and ontology design patterns (20.8%) are also familiar.Some respondents express here other less known approaches that could fit the broader categories (e.g. using a xyz:nextitem).Figure 8d shows that the community is divided in whether expressing lists in RDF is a real need; conversely, Fig. 8e shows that the impact of list modeling choices in query performance is a real concern (0% disagree; 83.3% agree or strongly agree).Figure 8f signals that current benchmarks might be missing coverage for RDF lists (only 8.3% find them somewhat covered).Most importantly, the community feels the need of new benchmarks specifically designed for the evaluation of RDF lists (Fig. 8g, 70.9%).Asking directly on their interest as potential users of a new RDF list benchmark, the community seems divided (Fig. 8h), although this could be attributed to different research interests.29.1% of the respondents would be interested in reusing an RDF list benchmark like the one here proposed.

Conclusions
Lists are fundamental data structures in computer science, and various models implementing them in the Semantic Web -using RDF, standards, and community best practices-have been proposed.So far, studying the differences, and trade-offs, in features and performance of these RDF list models has been done only in a superficial and exploratory manner.To address this, in this paper we contribute two important findings.First, we show evidence that the Semantic Web community feels the need for a benchmark specifically designed for the evaluation of RDF lists; and that a number of researchers would be interested in reusing such a benchmark.Second, we propose the benchmark to precisely address this issue, enabling a systematic and principled way of generating, and querying, RDF list data from real-world datasets according to dominant RDF list models in the Semantic Web.We feel that, by adopting this benchmark, researchers will be able to understand better the implications of different listmodeling practices; and developers will find a first building block to construct more varied and performant solutions for RDF lists.We expect both researchers and developers to fundamentally contribute, through their research and software, in making the List.MID benchmark better.
This room for improvement can be observed from various prisms.First, in next iterations we will include more real-world use cases and base datasets from which to generate the benchmark data.Similarly, we will include additional list operations regarding list management, such as inserting a new element, and swapping two elements, taking inspiration from array operations in programming [11].If more, alternative models for modeling RDF lists become a need for our users, we will support them too.Finally, we will continue working to deploy a more automated and usable infrastructure and tools for RDF list benchmarking.

Fig. 7 .
Fig. 7. Excerpt of the MIDI ontology.Tracks contain lists of sequential MIDI events.

Fig. 8 .
Fig. 8. Results of the online survey

Table 2 .
SPARQL query templates of the benchmark.