TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publications
  4. Do we need real data? Testing and training algorithms with artificial geolocation data
 
Options

Do we need real data? Testing and training algorithms with artificial geolocation data

Citation Link: https://doi.org/10.15480/882.2671
Publikationstyp
Conference Paper
Date Issued
2019
Sprache
English
Author(s)
Kaiser, Jan  
Bavendiek, Kai  
Schupp, Sibylle  
Institut
Softwaresysteme E-16  
TORE-DOI
10.15480/882.2671
TORE-URI
http://hdl.handle.net/11420/4344
First published in
GI-Edition  
Number in series
294
Citation
In: David, K., Geihs, K., Lange, M. & Stumme, G. (Hrsg.), INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft. Bonn: Gesellschaft für Informatik e.V.. (S. 205-218).
Contribution to Conference
50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft, Fachtagung vom 23.-26. September 2019 in Kassel  
Publisher DOI
10.18420/inf2019_25
Scopus ID
2-s2.0-85090835039
Publisher
Gesellschaft für Informatik
As big data becomes increasingly important, so do algorithms that operate on geolocation data. Privacy requirements and the cost of collecting large sets of geolocation data, however, make it difficult to test those algorithms with real data. Artificially generated data sets therefore present an appealing alternative. This paper explores the use of two types of neural networks as generators of geolocation data and introduces a method based on the Turing Test to determine whether generated geolocation data is indistinguishable from real data. In an extensive evaluation we apply the method to data generated by our own implementation of neural networks as well as the widely used BerlinMOD generator on the one hand, the four most prominent data sets of real geolocation data covering at total of 65 million records on the other hand. The experiments show that in eleven of twelve cases artificial data sets can be told from real ones. We conclude that, at present, the generators we tested provide no safe replacement for real data.
Subjects
geolocation data
artificial data
data generation
neural networks generators
data quality
DDC Class
004: Informatik
Lizenz
https://creativecommons.org/licenses/by-sa/4.0/
Loading...
Thumbnail Image
Name

paper3_02.pdf

Size

1.3 MB

Format

Adobe PDF

TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback