Balanced large scale knowledge matching using LSH forest

Michael Cochez*, Vagan Terziyan, Vadim Ermolayev

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Evolving Knowledge Ecosystems were proposed recently to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the use of LSH Forest (a self-tuning indexing schema based on locality-sensitive hashing) for solving the problem of placing new knowledge tokens in the right contexts of the environment. We argue and show experimentally that LSH Forest possesses required properties and could be used for large distributed set-ups.

Original languageEnglish
Title of host publicationSemantic Keyword-Based Search on Structured Data Sources First COST Action IC1302 – International KEYSTONE Conference, IKC 2015, Revised Selected Papers
EditorsYannis Velegrakis, Jorge Cardoso, Jorge Cardoso, Alexandre Miguel Pinto, Francesco Guerra, Geert-Jan Houben
PublisherSpringer Verlag
Pages36-50
Number of pages15
ISBN (Print)9783319279312
DOIs
Publication statusPublished - 1 Jan 2015
Externally publishedYes
Event1st COST Action IC1302 International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources, IKC 2015 - Coimbra, Portugal
Duration: 8 Sept 20159 Sept 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9398
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st COST Action IC1302 International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources, IKC 2015
Country/TerritoryPortugal
CityCoimbra
Period8/09/159/09/15

Funding

The authors would like to thank the department of Mathematical Information Technology of the University of Jyv?skyl? for financially supporting this research. This research is also in part financed by the N4S SHOK organized by Digile Oy and financially supported by TEKES. The authors would further like to thank Steeri Oy for supporting the research and the members of the Industrial Ontologies Group (IOG) of the University of Jyv?skyl? for their support in the research. Further, it has to be mentioned that the implementation of the software was greatly simplified by the Guava library by Google, the Apache Commons MathTM library, and the Rabin hash library by Bill Dwyer and Ian Brandt.

Keywords

  • Big data
  • Evolving knowledge ecosystems
  • Locality-sensitive hashing
  • LSH forest

Fingerprint

Dive into the research topics of 'Balanced large scale knowledge matching using LSH forest'. Together they form a unique fingerprint.

Cite this