Abstract
From the perspective of machine learning and data mining applications, expressing data in RDF rather than a domain-specific for- mat can add complexity and obfuscate the internal structure. We in- vestigate and illustrate this issue with an example where bio-molecular graph datasets are expressed in RDF. We use this example to inspire pre- processing techniques which reverse some of the complications of adding semantic annotations, exposing those patterns in the data that are most relevant to machine learning. We test these methods in a number of clas- sification experiments and show that they can improve performance both for our example datasets and real-world RDF datasets.
Original language | English |
---|---|
Title of host publication | KNOW@ LOD |
Publication status | Published - 2014 |