Palmer Station LTER Information Management Site Byte - September 2010 Karen Baker Information Management Committee Inquiry Introduction Site Bytes are intended as a general update on what has happened at your site during the last year. Please highlight new developments, ideas, and issues. Site bytes help us to stay informed about what is going on at other sites. Please also be forward thinking and include your thoughts on the following subjects with regards to the developing Network Information System. We will discuss these questions further at the annual meeting: Given limited resources, how do you prioritize data contributions to the NIS? 1) Do you give a lot of attention to a few datasets or limited attention to a lot? 2) Do you deliver derived or raw data? 3) Do you communicate with your PIs regarding prioritization of data? PAL Responses 1) Do you give a lot of attention to a few datasets or limited attention to a lot? Between these two extremes, we strike a balance. Our strategy has been to create the technical capability to rapidly create and update datasets together with a management interface for submitting to community collections. Last year we focused on improving our information system in ways that put us in a position to work with the NIS development team. We also improved our metadata approach while working with several local groups with complex datasets who had not yet submitted data. In addition, site work focused on streamlining of weather data handling so as to have monthly updates for ClimDB as well as summary plots posted in DataZoo. Quality control has been left to the data submitter to date but we plan to develop some capabilities with this in the next year as part of the information management services and hence to begin more routine submissions of more assuredly high quality data to NIS. 2) Do you deliver derived or raw data? The raw-derived question seems misleading when stated as a binary. We see multiple stages as part of a processing continuum that varies by dataset depending upon sampling, instrumentation, calibration, and analysis and so we consider the decision as to what dataset stage to deliver as dataset dependent. For many researchers, the questions of what to submit is still under investigation as they may submit abundances from counts obtained via microscope analysis and then submit carbon converted counts as they work with the data. We find that over time and with experience, our understanding of the data and the data audiences as well as the shared projects such as EcoTrends influences our data delivery. 3) Do you communicate with your PIs regarding prioritization of data? Yes, this is the strength of an LTER site to have IM guided in practice by local data needs and by research scientists who collect, analyze, submit, and use the data. Data inventories are created and discussed periodically. The significant technical capacity improvement this year coupled with our planned focus on metadata and data quality reviews next year put us in a position to include submission of data to Metacat as part of a more routine data process next year.u80 PAL Site Byte 1) Local Development Focus this year was on technical development of DataZoo, a site-level information system now used by four oceanographic sites (two LTER sites (PAL, CCE) and two CalCOFI sites (SIO, SWFSC)), in an effort to better meet local needs while developing web services as a foundational element for upcoming site-network development. Redesign of the study-project schema for DataZoo added flexibility in response to metadata needs identified in practice and by the IMC working group effort with project collections. Significant updates were made to the DataZoo management interface that allows creation and editing of organizations, studies, datasets, and their related metadata. New approaches to web delivery through development of middleware and standardized, asynchronous access to databases through a web service layer successfully addressed issues relating to the increasing size and diversity of data. Expanding DataZoo to a three-component system accommodated the development of FileFinder, an interface to very-large collections of hierarchically structured datasets. The prototype of FileFinder was populated with oceanographic CTD data.u80 Additional work relating to DataZoo included adding the support for taxonomic identifiers to the attribute qualifier system, use of the Integrated Taxonomic Information System (ITIS) as a taxonomic authority that prompted conversion of NODC taxonomic codes, and redesign of query selection so users can select studies within a designated time interval. Plot performance was improved by replacement of the plot library and options added for contour and bubble plots. Handling of external datasets like weather was streamlined to facilitate monthly updates for ClimDB as well as summary plots posted in DataZoo. Improvements were made to dataset documentation as datasets were submitted for publication in EcoTrends. An Information Management media gallery was created in order to preserve posters as a historical record of development. Finally, backups are now done via the Computational Infrastructure Service at IOD in conjunction with the San Diego Supercomputer. In terms of physical infrastructure, a new server with virtual machine capability was purchased (vSurf). u80 2) Partner Activities and Leadership Our team (Mason Kortz, Lynn Yarmey, James Conners) is leading a network-level LTER unit registry and unit best practices effort that received support from a LTER post All Scientist Meeting proposal award. Two cross-site visits (KBS, JRN) and a visit to the Network Office were carried out as part of this project. This work has informed redesign of DataZoo architecture to a web-services orientation. The new unit registry together with its management interface replaces a static, isolated unit dictionary thus enacting one type of data comparability as well as demonstrating a new type of site-network model that enables site participation in NIS development through web services. Our team (Mason Kortz, James Conners) is also leading a joint IMC-LNO web services working group and serving as a NIS tiger team member (James Conners). Karen Baker is co-leading the IMC Governance Working Group (GWG). The GWG was created in order to facilitate the conduct of LTER business in an arena of growing complexity and responsibility and to support the development of Terms of Reference or by-laws for committees and working groups. u80 3) Professional Development and Publication Professional development of information managers was provided through exposure to site-level and network-level development activities with the Unit Registry project. In addition, a weekly summer informatics reading group stimulated discussion and joint learning. A sociology graduate student studying the Design Studio stimulated reflection on our work as well as on our work arena. Lynn Yarmey received a fellowship to pursue two-year masters in Library and Information Science with a concentration in Data Curation through the University of Illinois, one of the national iSchools. u80 At the LTER Information Management Committee (IMC) annual meeting CCE/PAL members contributed as co-chairs of the IM Governance Working Group and Unit Working Groups. Lynn Yarmey reported on a Unit Dictionary Best Practices Guideline. Karen Baker was elected a member of the Network Information System Advisory Committee (NISAC). Mason Kortz and James Conners prototyped the LTER Unit Registry and contributed to formation of a new Web Services Working Group. At the LTER All Scientist Meeting, contributions included a series of posters, participation on a panel about Network level efforts, and a workshop focusing on curriculum for information managers.u80 Collaborating on two chapters about the EcoTrends Project contributed to description and understanding of network level information management efforts as well as development of a set of recommendations for future work (Laney et al, in press). Another book chapter explores the topic of digital infrastructure growth (GCBowker, KSBaker, FMillerand, and DRibes, 2010. Towards Information Infrastructure Studies: Ways of Knowing in a Networked Environment. In Int. Handbook of Internet Research). Collaboration with Science Studies partners led to publications in peer-reviewed journals. An interdisciplinary investigation into work with data over time spanning decades provides insight into ÔBig DataÕ efforts and the LTER network model (Aronova, Baker, and Oreskes, 2010. From the International Geophysical Year to the International Biological Program: Big Science and Big Data in Biology, 1957-present. Historical Studies in the Natural Sciences 40(2):183-224). A study of design provides insight into the complexity of the development processes at an LTER site (Millerand and Baker, 2010. Who are the users? From a single users to a web of users in the design of a working standard. Information Systems Journal 20:137-161).u80 4) In Summary We plan to focus next year on metadata population, quality control, and data delivery as well as on documentation of the DataZoo system. Technical work with the unit registry and web services will continue at a network level. Technical issues that remain to be addressed include integration of the personnel module with DataZoo, on geolocation based on the earlier gazetteer effort, and on management of dataset use logging. Our website content will be updated but migration to drupal/wordpress/etc is not yet scheduled. The application module MediaZoo was redesigned as an API. The move from a photo gallery to a plot gallery and eventually to the media gallery module has reached an appropriate level of abstraction that it is time to make uniform the five instances of installation currently in use.u80 Despite significant experience with information management practices and issues at sites, innovation at sites is limited by lack of resources and incentives. The 2010 information management supplement stabilizes our work on DataZoo and collaboration on the LTER NIS this year but highlights the need either to continue such support or to collaborate locally with additional projects in order to maintain a team with technical expertise at the site-level over the long-term.