Skip to main content

Recent Publications

Comprehensive Extensible Data Documentation and Access Repository (CED²AR)

CED²AR is designed to improve the documentation and discoverability of both public and restricted data from the federal statistical system. CED²AR is based upon leading metadata standards (Data Documentation Initiative, DDI) and will interconnect with a variety of other metadata sources, such as those expressed in Statistical Data and Metadata eXchange (SDMX).In order to support the goals set out in Abowd, Vilhuber, and Block (2013), we developed extensions to DDI that allowed us to implement new features in real-world examples. The relevant extensions to the DDI-C (DDI 2.5) branch are listed below. Some of these features are or will be available in DDI-L (DDI 3.x) and later versions; however, we believe that the substantial installed base of DDI-C implementations can benefit from these small extensions to the schema. Interested users will find more information at the links below. 

The web application CED²AR was  developed to be able to expose and edit the new features. Not all features are obvious to the end user. However, in proceeding to develop and implement our extensions, we also became sole host for a number of DDI codebooks that do not exist elsewhere. The main CED²AR instance both hosts those unique codebooks, as well as allows us to showcase the new features in our DDI extensions. The production server can be found at http://www2.ncrn.cornell.edu/ced2ar-web/.

Proposed DDI-C enhancements

Proposed DDI 2.5.1-NCRN enhancement:

  • This enhancements allows to assign access rights at a more granular level than in standard DDI 2.5.1.  
  • Users can link directly to the schema by adding the following attribute to the <codeBook> definition in their DDI XML:
     xsi:schemaLocation="ddi:codebook:2_5 http://www.ncrn.cornell.edu/docs/ddi/2.5.NCRN/schemas/codebook.xsd"

DDI+PROV

  • Proposed DDI 2.5.1-NCRN enhancement with PROV integration: allows to embed PROV and link to PROV bundles in DDI, enhancing provenance tracking in metadata.
  • Users can link directly to the schema by adding the following attribute to the <codeBook> definition in their DDI XML:
     xsi:schemaLocation="ddi:codebook:2_5 http://www.ncrn.cornell.edu/docs/ddi/2.5.NCRN.P/schemas/codebook.xsd"

Software

Selected bibliography

  1. John M Abowd, Lars Vilhuber and William Block. A Proposed Solution to the Archiving and Curation of Confidential Scientific Inputs. In Josep Domingo-Ferrer and Ilenia Tinnirello (eds.). Privacy in Statistical Databases. Lecture Notes in Computer Science series, volume 7556, Springer Berlin Heidelberg, 2012, pages 216-225.
    URL, DOI BibTeX

    @incollection{raey,
    	title = "A Proposed Solution to the Archiving and Curation of Confidential Scientific Inputs",
    	author = "Abowd, John M. and Vilhuber, Lars and Block, William",
    	booktitle = "Privacy in Statistical Databases",
    	publisher = "Springer Berlin Heidelberg",
    	year = 2012,
    	editor = "Domingo-Ferrer, Josep and Tinnirello, Ilenia",
    	pages = "216-225",
    	series = "Lecture Notes in Computer Science",
    	volume = 7556,
    	doi = "10.1007/978-3-642-33627-0_17",
    	isbn = "978-3-642-33626-3",
    	keywords = "Data Archive; Data Curation; Statistical Disclosure Limitation; Privacy-preserving Datamining",
    	url = "http://dx.doi.org/10.1007/978-3-642-33627-0_17"
    }
    
  2. Carl Lagoze, William C Block, Jeremy Williams, John M Abowd and Lars Vilhuber. Data Management of Confidential Data. International Journal of Digital Curation 8(1):265-278, 2013.
    Presented at 8th International Digital Curation Conference 2013, Amsterdam. URL points to pre-publication copy..
    URL, DOI BibTeX

    @article{DBLP:journals/ijdc/LagozeBWAV13,
    	title = "Data Management of Confidential Data",
    	author = "Carl Lagoze and William C. Block and Jeremy Williams and John M. Abowd and Lars Vilhuber",
    	journal = "International Journal of Digital Curation",
    	year = 2013,
    	note = "Presented at 8th International Digital Curation Conference 2013, Amsterdam. URL points to pre-publication copy.",
    	number = 1,
    	pages = "265-278",
    	volume = 8,
    	abstract = "Social science researchers increasingly make use of data that is confidential because it contains linkages to the identities of people, corporations, etc. The value of this data lies in the ability to join the identifiable entities with external data such as genome data, geospatial information, and the like. However, the confidentiality of this data is a barrier to its utility and curation, making it difficult to fulfill US federal data management mandates and interfering with basic scholarly practices such as validation and reuse of existing results. We describe the complexity of the relationships among data that span a public and private divide. We then describe our work on the CED2AR prototype, a first step in providing researchers with a tool that spans this divide and makes it possible for them to search, access, and cite that data.",
    	bibsource = "DBLP, http://dblp.uni-trier.de",
    	doi = "10.2218/ijdc.v8i1.259",
    	owner = "vilhuber",
    	timestamp = "2013.10.09",
    	url = "http://hdl.handle.net/1813/30924"
    }
    
  3. Carl Lagoze, William C Block, Jeremy Williams and Lars Vilhuber. Encoding Provenance of Social Science Data: Integrating PROV with DDI. In 5th Annual European DDI User Conference. 2013.
    BibTeX

    @inproceedings{LagozeEtAl2013,
    	title = "Encoding Provenance of Social Science Data: Integrating PROV with DDI",
    	author = "Carl Lagoze and William C. Block and Jeremy Williams and Lars Vilhuber",
    	booktitle = "5th Annual European DDI User Conference",
    	year = 2013,
    	abstract = "Provenance is a key component of evaluating the integrity and reusability of data for scholarship. While recording and providing access provenance has always been important, it is even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. The PROV model, developed under the auspices of the W3C, is a foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We report on the results of our experimentation with integrating the PROV model into the DDI metadata for a complex, but characteristic, example social science data. We also present some preliminary thinking on how to visualize those graphs in the user interface.",
    	keywords = "Metadata, Provenance, DDI, eSocial Science",
    	owner = "vilhuber",
    	timestamp = "2013.10.09"
    }
    
  4. Carl Lagoze, Lars Vilhuber, Jeremy Williams, Benjamin Perry and William C Block. CED²AR: The Comprehensive Extensible Data Documentation and Access Repository. In ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014). 2014.
    Presented at the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014).
    BibTeX

    @inproceedings{LagozeJCDL2014,
    	title = "CED²AR: The Comprehensive Extensible Data Documentation and Access Repository",
    	author = "Carl Lagoze and Lars Vilhuber and Jeremy Williams and Benjamin Perry and William C. Block",
    	booktitle = "ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014)",
    	year = 2014,
    	address = "London, United Kingdom",
    	month = "8th-12th September 2014",
    	note = "Presented at the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014)",
    	organization = "ACM/IEEE",
    	abstract = "Social science researchers increasingly make use of data that is confidential because it contains linkages to the identities of people, corporations, etc. The value of this data lies in the ability to join the identifiable entities with external data such as genome data, geospatial information, and the like. However, the confidentiality of this data is a barrier to its utility and curation, making it difficult to fulfill US federal data management mandates and interfering with basic scholarly practices such as validation and reuse of existing results. We describe the complexity of the relationships among data that span a public and private divide. We then describe our work on the CED2AR prototype, a first step in providing researchers with a tool that spans this divide and makes it possible for them to search, access, and cite that data.",
    	owner = "vilhuber",
    	timestamp = "2014.07.09"
    }
    
  5. Carl Lagoze, Jeremy Willliams and Lars Vilhuber. Encoding Provenance Metadata for Social Science Datasets. In Emmanouel Garoufallou and Jane Greenberg (eds.). Metadata and Semantics Research 390. 2013, 123-134.
    URL, DOI BibTeX

    @inproceedings{LagozeEtAl2013b,
    	author = "Lagoze, Carl and Willliams, Jeremy and Vilhuber, Lars",
    	title = "Encoding Provenance Metadata for Social Science Datasets",
    	booktitle = "Metadata and Semantics Research",
    	year = 2013,
    	editor = "Garoufallou, Emmanouel and Greenberg, Jane",
    	volume = 390,
    	series = "Communications in Computer and Information Science",
    	pages = "123-134",
    	publisher = "Springer International Publishing",
    	doi = "10.1007/978-3-319-03437-9_13",
    	isbn = "978-3-319-03436-2",
    	keywords = "Metadata; Provenance; DDI; eSocial Science",
    	owner = "vilhuber",
    	timestamp = "2013.11.05",
    	url = "http://dx.doi.org/10.1007/978-3-319-03437-9_13"
    }
    

Additional resources