The Knowledge Graph as the Default Data Model for Machine Learning

Research output: Scientific - peer-reviewArticle

Abstract

In modern machine learning, raw data is the pre-ferred input for our models. Where a decade ago data sci-entists were still engineering features, manually picking out the details they thought salient, they now prefer the data as raw as possible. As long as we can assume that all relevant and irrelevant information is present in the input data, we can design deep models that build up intermediate represen-tations to sift out relevant features. In some areas, however, we struggle to find this raw form of data. One such area involves heterogeneous knowledge: entities, their attributes and internal relations. The Semantic Web community has in-vested decades of work on just this problem: how to repre-sent knowledge, in various domains, in as raw and as usable a form as possible, satisfying many use cases. This work has led to the Linked Open Data Cloud, a vast and distributed knowledge graph. If we can develop methods that operate on this raw form of data–the knowledge graph–we can dispense with a great deal of ad-hoc feature engineering and train deep models end-to-end in many more domains. In this position paper, we describe current research in this area and discuss some of the promises and challenges of this approach.
Original languageEnglish
Pages (from-to)1-19
Number of pages19
JournalData Science
DOIs
StatePublished - 17 Oct 2017

Cite this

Wilcke, Xander; Bloem, Peter; De Boer, Victor / The Knowledge Graph as the Default Data Model for Machine Learning.

In: Data Science, 17.10.2017, p. 1-19.

Research output: Scientific - peer-reviewArticle

@article{d589859477984c67919e8ae0d5586e4c,
title = "The Knowledge Graph as the Default Data Model for Machine Learning",
abstract = "In modern machine learning, raw data is the pre-ferred input for our models. Where a decade ago data sci-entists were still engineering features, manually picking out the details they thought salient, they now prefer the data as raw as possible. As long as we can assume that all relevant and irrelevant information is present in the input data, we can design deep models that build up intermediate represen-tations to sift out relevant features. In some areas, however, we struggle to find this raw form of data. One such area involves heterogeneous knowledge: entities, their attributes and internal relations. The Semantic Web community has in-vested decades of work on just this problem: how to repre-sent knowledge, in various domains, in as raw and as usable a form as possible, satisfying many use cases. This work has led to the Linked Open Data Cloud, a vast and distributed knowledge graph. If we can develop methods that operate on this raw form of data–the knowledge graph–we can dispense with a great deal of ad-hoc feature engineering and train deep models end-to-end in many more domains. In this position paper, we describe current research in this area and discuss some of the promises and challenges of this approach.",
keywords = "End-to-End Learning, Knowledge Graphs, Machine Learning, Position paper, Semantic Web",
author = "Xander Wilcke and Peter Bloem and {De Boer}, Victor",
year = "2017",
month = "10",
doi = "10.3233/DS-170007",
pages = "1--19",
journal = "Data Science",
issn = "2451-8484",
publisher = "IOS Press",

}

The Knowledge Graph as the Default Data Model for Machine Learning. / Wilcke, Xander; Bloem, Peter; De Boer, Victor.

In: Data Science, 17.10.2017, p. 1-19.

Research output: Scientific - peer-reviewArticle

TY - JOUR

T1 - The Knowledge Graph as the Default Data Model for Machine Learning

AU - Wilcke,Xander

AU - Bloem,Peter

AU - De Boer,Victor

PY - 2017/10/17

Y1 - 2017/10/17

N2 - In modern machine learning, raw data is the pre-ferred input for our models. Where a decade ago data sci-entists were still engineering features, manually picking out the details they thought salient, they now prefer the data as raw as possible. As long as we can assume that all relevant and irrelevant information is present in the input data, we can design deep models that build up intermediate represen-tations to sift out relevant features. In some areas, however, we struggle to find this raw form of data. One such area involves heterogeneous knowledge: entities, their attributes and internal relations. The Semantic Web community has in-vested decades of work on just this problem: how to repre-sent knowledge, in various domains, in as raw and as usable a form as possible, satisfying many use cases. This work has led to the Linked Open Data Cloud, a vast and distributed knowledge graph. If we can develop methods that operate on this raw form of data–the knowledge graph–we can dispense with a great deal of ad-hoc feature engineering and train deep models end-to-end in many more domains. In this position paper, we describe current research in this area and discuss some of the promises and challenges of this approach.

AB - In modern machine learning, raw data is the pre-ferred input for our models. Where a decade ago data sci-entists were still engineering features, manually picking out the details they thought salient, they now prefer the data as raw as possible. As long as we can assume that all relevant and irrelevant information is present in the input data, we can design deep models that build up intermediate represen-tations to sift out relevant features. In some areas, however, we struggle to find this raw form of data. One such area involves heterogeneous knowledge: entities, their attributes and internal relations. The Semantic Web community has in-vested decades of work on just this problem: how to repre-sent knowledge, in various domains, in as raw and as usable a form as possible, satisfying many use cases. This work has led to the Linked Open Data Cloud, a vast and distributed knowledge graph. If we can develop methods that operate on this raw form of data–the knowledge graph–we can dispense with a great deal of ad-hoc feature engineering and train deep models end-to-end in many more domains. In this position paper, we describe current research in this area and discuss some of the promises and challenges of this approach.

KW - End-to-End Learning

KW - Knowledge Graphs

KW - Machine Learning

KW - Position paper

KW - Semantic Web

U2 - 10.3233/DS-170007

DO - 10.3233/DS-170007

M3 - Article

SP - 1

EP - 19

JO - Data Science

T2 - Data Science

JF - Data Science

SN - 2451-8484

ER -