NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Querying Semi-Structured DataThe amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.
Document ID
20050061322
Acquisition Source
Goddard Space Flight Center
Document Type
Other
Authors
Abiteboul, Serge
Date Acquired
September 7, 2013
Publication Date
January 1, 1997
Subject Category
Documentation And Information Science
Report/Patent Number
AD-A428473
Funding Number(s)
CONTRACT_GRANT: F306-95-C-0119
CONTRACT_GRANT: F33615-93-1-1339
Distribution Limits
Public
Copyright
Work of the US Gov. Public Use Permitted.
No Preview Available