Predicting Sense of Community and Participation by Applying Machine Learning to Open Government Data

Alessandro Piscopo, Ronald Siebes, Lynda Hardman

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Community capacity is used to monitor socioeconomic development. It is composed of a number of dimensions that can be measured to understand issues possibly arising in the implementation of a policy or of a project targeting a community. Measuring these dimensions is thus highly valuable for policymakers and local administrator, though expensive and time consuming. To address this issue, we evaluated their estimation through a machine learning technique—Random Forests—applied to secondary open government data and determined the most important variables for prediction. We focused on two dimensions: sense of community and participation. The variables included in the data sets used to train the predictive models complied with two criteria: nationwide availability and sufficiently fine-grained geographic breakdown, that is, neighborhood level. Our resultant models are more accurate than others based on traditional statistics found in the literature, showing the feasibility of the approach. The most determinant variables in our models were only partially in agreement with the most influential factors for sense of community and participation according to the social science literature consulted, providing a starting point for future investigation under a social science perspective. Moreover, due to the lack of geographic detail of the outcome measures available, further research is required to apply the predictive models to a neighborhood level.

Original languageEnglish
Pages (from-to)55-75
Number of pages21
JournalPolicy and Internet
Volume9
Issue number1
DOIs
Publication statusPublished - 1 Mar 2017

Fingerprint

Learning systems
participation
Social sciences
Social Sciences
predictive model
learning
community
social science
socioeconomic development
Administrative Personnel
statistics
Outcome Assessment (Health Care)
Statistics
Availability
determinants
Community Participation
Machine Learning
lack
Research
literature

Keywords

  • civic participation
  • e-Government
  • machine learning
  • open data
  • sense of community

Cite this

@article{31f94f97629640188baaf8e125b92190,
title = "Predicting Sense of Community and Participation by Applying Machine Learning to Open Government Data",
abstract = "Community capacity is used to monitor socioeconomic development. It is composed of a number of dimensions that can be measured to understand issues possibly arising in the implementation of a policy or of a project targeting a community. Measuring these dimensions is thus highly valuable for policymakers and local administrator, though expensive and time consuming. To address this issue, we evaluated their estimation through a machine learning technique—Random Forests—applied to secondary open government data and determined the most important variables for prediction. We focused on two dimensions: sense of community and participation. The variables included in the data sets used to train the predictive models complied with two criteria: nationwide availability and sufficiently fine-grained geographic breakdown, that is, neighborhood level. Our resultant models are more accurate than others based on traditional statistics found in the literature, showing the feasibility of the approach. The most determinant variables in our models were only partially in agreement with the most influential factors for sense of community and participation according to the social science literature consulted, providing a starting point for future investigation under a social science perspective. Moreover, due to the lack of geographic detail of the outcome measures available, further research is required to apply the predictive models to a neighborhood level.",
keywords = "civic participation, e-Government, machine learning, open data, sense of community",
author = "Alessandro Piscopo and Ronald Siebes and Lynda Hardman",
year = "2017",
month = "3",
day = "1",
doi = "10.1002/poi3.145",
language = "English",
volume = "9",
pages = "55--75",
journal = "Policy and Internet",
issn = "1944-2866",
publisher = "Wiley",
number = "1",

}

Predicting Sense of Community and Participation by Applying Machine Learning to Open Government Data. / Piscopo, Alessandro; Siebes, Ronald; Hardman, Lynda.

In: Policy and Internet, Vol. 9, No. 1, 01.03.2017, p. 55-75.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Predicting Sense of Community and Participation by Applying Machine Learning to Open Government Data

AU - Piscopo, Alessandro

AU - Siebes, Ronald

AU - Hardman, Lynda

PY - 2017/3/1

Y1 - 2017/3/1

N2 - Community capacity is used to monitor socioeconomic development. It is composed of a number of dimensions that can be measured to understand issues possibly arising in the implementation of a policy or of a project targeting a community. Measuring these dimensions is thus highly valuable for policymakers and local administrator, though expensive and time consuming. To address this issue, we evaluated their estimation through a machine learning technique—Random Forests—applied to secondary open government data and determined the most important variables for prediction. We focused on two dimensions: sense of community and participation. The variables included in the data sets used to train the predictive models complied with two criteria: nationwide availability and sufficiently fine-grained geographic breakdown, that is, neighborhood level. Our resultant models are more accurate than others based on traditional statistics found in the literature, showing the feasibility of the approach. The most determinant variables in our models were only partially in agreement with the most influential factors for sense of community and participation according to the social science literature consulted, providing a starting point for future investigation under a social science perspective. Moreover, due to the lack of geographic detail of the outcome measures available, further research is required to apply the predictive models to a neighborhood level.

AB - Community capacity is used to monitor socioeconomic development. It is composed of a number of dimensions that can be measured to understand issues possibly arising in the implementation of a policy or of a project targeting a community. Measuring these dimensions is thus highly valuable for policymakers and local administrator, though expensive and time consuming. To address this issue, we evaluated their estimation through a machine learning technique—Random Forests—applied to secondary open government data and determined the most important variables for prediction. We focused on two dimensions: sense of community and participation. The variables included in the data sets used to train the predictive models complied with two criteria: nationwide availability and sufficiently fine-grained geographic breakdown, that is, neighborhood level. Our resultant models are more accurate than others based on traditional statistics found in the literature, showing the feasibility of the approach. The most determinant variables in our models were only partially in agreement with the most influential factors for sense of community and participation according to the social science literature consulted, providing a starting point for future investigation under a social science perspective. Moreover, due to the lack of geographic detail of the outcome measures available, further research is required to apply the predictive models to a neighborhood level.

KW - civic participation

KW - e-Government

KW - machine learning

KW - open data

KW - sense of community

UR - http://www.scopus.com/inward/record.url?scp=85015694512&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015694512&partnerID=8YFLogxK

U2 - 10.1002/poi3.145

DO - 10.1002/poi3.145

M3 - Article

VL - 9

SP - 55

EP - 75

JO - Policy and Internet

JF - Policy and Internet

SN - 1944-2866

IS - 1

ER -