Relating language and sound: two distributional models.

C.W.J. van Miltenburg, A. Lopopolo

Research output: Contribution to ConferencePosterOther research output


We present preliminary results in the domain of sound labeling and sound representation. Our work is based on data from the Freesound database, which contains thousands of sounds complete with tags and descriptions, under a Creative Commons license. We want to investigate how people represent and categorize different sounds, and how language reflects this categorization. Moreover, following recent developments in multimodal distributional semantics (Bruni et al. 2012), we want to assess whether acoustic information can improve the semantic representation of lexemes.

We have built two different distributional models on the basis of a subset of the Freesound database, containing all sounds that were manually classified as SoundFX (e.g. footsteps, opening and closing doors, animal sounds). The first model is based on tag co-occurrence. On the basis of this model, we created a network of tags that we partitioned using cluster analysis. The clustering intuitively seems to correspond with different types of scenes. We imagine that this partitioning is a first step towards linking particular sounds with relevant frames in FrameNet. The second model is built using a bag-of-auditory-words approach. In order to assess the goodness of the semantic representations, the two models are compared to human judgment scores from the WordSim353 and MEN database.
Original languageEnglish
Publication statusPublished - 2015
EventCLIN 25 -
Duration: 6 Feb 20156 Feb 2015


ConferenceCLIN 25


Dive into the research topics of 'Relating language and sound: two distributional models.'. Together they form a unique fingerprint.

Cite this