Drexel University Home Pagewww.drexel.edu DREXEL UNIVERSITY LIBRARIES HOMEPAGE >>
iDEA DREXEL ARCHIVES >>

iDEA: Drexel E-repository and Archives > Drexel Theses and Dissertations > Drexel Theses and Dissertations > Data use and access behavior in eScience —exploring data practices in the new data-intensive science paradigm

Please use this identifier to cite or link to this item: http://hdl.handle.net/1860/3543

Title: Data use and access behavior in eScience —exploring data practices in the new data-intensive science paradigm
Authors: Zhang, Jian
Keywords: Information science;Information visualization;Communication in science--Data processing
Issue Date: 11-Jul-2011
Abstract: Science is entering its fourth paradigm of ―data-intensive science‖. Relatively little attention has been paid to the users of scientific data, particularly their data practices. This dissertation endeavors to advance our knowledge of data behavior in the new paradigm. In association with the scope of Sloan Digital Sky Survey (SDSS) project, I conduct two major lines of research: a content analysis of SDSS-related scientific publications to investigate astronomers‘ data use behavior, and a visual exploration analysis (VEA) of SDSS SQL query logs with the design of a visualization tool, SDSS Log Viewer. By integrating results from VEA and statistics, I conducted three case studies of SDSS log data to investigate users‘ data seeking behavior. For astronomers‘ data usage behavior, I found that: 1) while a large volume of scientific data is produced in SDSS, researchers that rely on SDSS only intended to leverage the large number and use more data; 2) studies that leveraged a large volume of data from multiple data sources are relatively rare in the SDSS research domain; 3) using data collected by others, both data collection projects and other researchers, is a common data behavior in the SDSS research community; and 4) the results of possibility of data reconstruction suggest that scientific publications themselves are insufficient for linking scientific data with the data sources. For users‘ data seeking behavior, I found that: 1) a small number of automatic query generators formed the major query traffics (in terms of the number of queries) to the SDSS data archive and six common categories of queries were identified. The number of query templates used by automatic query generators are small; 2) Academic researchers, who are the target users of the SDSS data archive, issued relatively large number of queries mannually. Compared to the queries generated by automatic data requestors, the query templates used by this type of users are rather diverse in terms of both sophistication of condition strings and complexity of query structures. A possible learning hierarchy is observed in this user group; and 3) occasional passing-by users are large in numbers, but their behavior is still unclear. As possible the first empirical study of users‘ data use and access behavior, aforementioned findings have lay down the foundation for wide range of future study. Also the method used in this study is generic and much of it is applicable to other fields because the specific steps in the methods are independent of application domains.
URI: http://hdl.handle.net/1860/3543
Appears in Collections:Drexel Theses and Dissertations

Files in This Item:

File Description SizeFormat
Zhang_Jian.pdf6.46 MBAdobe PDFView/Open
View Statistics

Items in iDEA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! iDEA Software Copyright © 2002-2010  Duraspace - Feedback