Description
Evaluation data for the paper "Tab2Know: Building a Knowledge Base from Tables in Scientific Papers" published at ISWC2020. For code, see https://github.com/karmaresearch/tab2know . This resource contains the following files: - `venues.txt`: The venues that were use for selecting PDFs from the [Semantic Scholar Open Research Corpus](http://s2-public-api-prod.us-west-2.elasticbeanstalk.com/corpus/) that were published in the last 5 years. - `extracted-tables.tar.gz`: All tables that we extracted using [Tabula](https://github.com/tabulapdf/tabula) from these PDFs. - `sample-400.tar.gz`: A sample of these tables which we used for annotation. - `ontology.ttl`: The annotation ontology in Turtle format. - `all_metadata.jsonl`: Annotations for this sample in the JSON format described below. - `labelqueries.csv`: The label queries used for weak annotation, created using the annotation interface. This CSV file contains 6 columns: a numeric ID, the label query template name (`template`), the template slots (`slots`), the label type (`label`), the annotation value (`value`), and a toggle for the interface (`enabled`). - `labelqueries-sparql-templates.zip`: The label query templates. These are SPARQL queries with slots of the form `{{slot}}`. The templates in `labelqueries.csv` refer to these files. - `rules.txt`: Datalog rules that we used for entity resolution. - `tab2know-graph.nt.gz`: The final RDF graph that contains all extracted table structures, predicted table and column classes, and resolved entity links.
Date made available | 2020 |
---|---|
Publisher | Zenodo |