Thursday, January 28, 2016

2016-01-28: January 2016 Federal Cloud Computing Summit


As I have mentioned previously, I am the MITRE chair of the Federal Cloud Computing summit. The Summits are designed to allow representatives from government agencies that would not necessarily cross paths to collaborate and learn from one another about the best practices, challenges, and recommendations for adopting emerging technologies in the federal government. The MITRE-ATARC Collaboration Symposium is a working group-style session in which academics, representatives from industry, government, and FFRDC representatives discuss potential solutions and ways-forward for the top challenges of emerging technology adoption in government. MITRE helps select the challenge areas by polling government practitioners on their top challenges, and the participants break into groups to discuss each challenge area. The Collaboration Symposium allows this heterogeneous group of cloud practitioners to collaborate across all levels, from the end users to researchers to practitioners to policy makers (at the officer level).





The Summit series includes mobile, Internet of Everything, big data, and cyber security summits along with the cloud summit, each of which occurs twice each year. MITRE produces a white paper that summarizes the MITRE-ATARC Collaboration Symposium. The white paper is shared with industry to communicate the top challenges and current needs of the federal government to guide product development, academia to identify the skillsets needed by the government and influence curricula development along with research topics, and government to communicate best practices and current challenges of other peer government agencies.

The Summit takes place in Washington, D.C. and is a full-day event. The day begins at 7:30 AM with registration and an industry trade show that allows industry representatives to communicate with government representatives about their challenges and the solutions that industry has to offer. At 9:00, a series of panel discussions by academic researchers and government. This also allows audience members to ask questions to the top implementers of cloud computing in the government and academia.

At 1:15, after lunch, the MITRE-ATARC Collaboration Symposium begins, and runs until 3:45. There is also a final out-briefing from each collaboration session a teh end of the day to communicate the major findings from each session to the summit participants.

Common threads from the summit included the importance of cloud security, the importance of incorporating other emerging technologies (e.g., mobile, big data, Internet of Things) in cloud computing, and how each emerging technology enables or enhances the others, and the importance of agile processes in cloud migration planning. More details on the outcomes will be included in the white paper, which should be released in 6-8 weeks. Prior white papers are available at the ATARC website.

The results of the Summit has implications for web archivists. With the increasing importance and emphasis on mobile, IoT, and cloud services, particularly within the government, there is an increased importance on archiving representations and the use of this material. As Julie Brill mentioned in her CNI talk, the government is interested in understanding how these services and technologies are being used regardless of whether or not there is a UI or other interface with which humans can interact. 

Archiving data endpoints from HTTP is comparatively trivial (although challenges still exist with archiving at high fidelity, particularly when considering JavaScript and deferred representations), but archiving a data service that might exchange data through non-HTTP or even push (as opposed to pull) transactions may change the paradigm used for web archiving.

With increased adoption, the archiving of representations reliant or designed to be consumed through emerging technologies will continue to increase and highlights a potential frontier in web archiving and digital preservation.


--Justin F. Brunelle *

* APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. CASE NUMBER 15-3250
The authors’ affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE’s concurrence with, or support for, the positions, opinions or viewpoints expressed by the authors.

Saturday, January 2, 2016

2016-01-02: Review of WS-DL's 2015


The Web Science and Digital Libraries Research Group had a terrific 2015, marked by four new student members, one Ph.D. defense, and two large research grants.  In many ways it was even better than 2014 and 2013.

We had fewer students graduate or advance their status this year, but last year was unusually productive.  We did add four new students, as well as graduate a PhD student, an MS student, and had two other students advance their status:
Hany's Defense Luncheon
Hany's defense saw us continue the WS-DL tradition of the post-PhD luncheon.

We had 16 publications in 2015, which was about the same as 2014 (15) but down from 2013's impressive 22 publications.  This year we had:
Next year we won't have this kind of showing at JCDL 2016 because Michele is one of the program co-chairs:

JCDL 2016 Chairs

In addition to the JCDL, TPDL, and iPRES conferences listed above, we traveled to and presented at ten conferences, workshops, or professional meetings that do not have formal proceedings:
We were also fortunate to host Michael Herzog for the spring 2015 semester:

MLN, MCW, and Michael Herzog

As well as Herbert Van de Sompel for an extended colloquium / planning visit:




We also released (or updated) a number of software packages, services, and format definitions:
  • Alexander Nwala created: 
  • Sawood released:
    • CDXJ -  a proposed serialization of CDX files (among other formats) in JSON format (based on his discussions with Ilya Kreymer
    • MemGator - A Go-based Memento aggregator (used by Ilya in his excellent emulation service oldweb.today).
  • Shawn, working with LANL colleagues, released the py-memento-client Python library.
  • Wes and Justin released "Mobile Mink", an Android Memento enabled client.  
  • Mat has continued to update the Mink Chrome extension (github, Chrome store). 
Our coverage in the popular press continued:
We were fortunate to receive two significant research grants this year, totaling nearly $1M:
Thanks to all who made 2015 a great year!  We are looking forward to 2016!

-- Michael


Thursday, December 24, 2015

2015-12-24: CNI Fall 2015 Membership Meeting Trip Report

The CNI Fall 2015 Membership Meeting was held in Washington, D.C., December 14-15, 2015.  Like all CNI meetings, the Fall 2015 meeting was excellent and contained many high quality presentations.  Unfortunately, the members' project briefings ran simultaneously, with 7 or 8 different presentations overlapping at any given time.  As a result I missed a great deal. 

Cliff Lynch kicked off the meeting with reflections about public access to federally funded research (e.g., CRS R42983), interoperability (e.g., OAI-ORE, ORCIDs, IIIF), linked data (e.g., Wikipedia notability guidelines for biographies),  privacy & surveillance (e.g., eavesdropping Barbies, Ashley Madison data breach, RFC 7624), and understanding the personalization algorithms that go into presenting (and thus archiving) the view of the web that you experience (e.g., our 2013 D-Lib Magazine article about mobile vs. desktop & GeoIP), and much more.  I'm hesitant to try to further summarize his talk -- watching the video of his talk, as always, is time well spent.

In the next session Herbert and I presented "Achieving Meaningful Interoperability for Web-based Scholarship", which is basically a summary of our recent D-Lib Magazine paper "Reminiscing About 15 Years of Interoperability Efforts". 



2016-01-07 Edit: CNI has now posted the video of our presentation:



See also the excellent summary and commentary from David Rosenthal about the "signposting" proposal.

The next session I split between "Linked Data for Libraries and Archives: LD4L and Europeana" (see the "Linked Data for Libraries" site) and "Is Gold Open Access Sustainable? Update from the UC Pay-It-Forward Project" (slides, video).  The final session of the day included several presentations I would have liked to have seen but didn't.  I understand "Documenting Ferguson: Building A Community Digital Repository" (slides) was good & standing room only.

I missed the opening session on the second day (including the "Update on Funding Opportunities" presentation), but made the presentation from David Rosenthal about emulation.  See the transcript of his talk, as well as his 2015 Emulation and Virtualization as Preservation Strategies report for the AMF.

Unfortunately, David's talk collided with that of Martin & his UCLA colleagues.  Fortunately, CNI has posted the video of their talk, his slides are online, and he has a great interactive site to explore the data.




After lunch I attend Rob's talk "The Future of Linked Data in Libraries: Assessing BibFrame Against Best Practices" (slides).  Rob even referenced my "no free kittens" slogan (tirade?) from our time developing OAI-ORE:




The closing plenary was an excellent talk from Julie Brill, head of the Federal Trade Commission, entitled "Transparency, Trust, and Consumer Protection in a Complex World".  The transcript is worth reading, but the essence of the talk explores the role the FTC would (should?) play in making sure that consumers can be aware of the data that companies track about them and how that data is used to make decisions about the consumers. (2016-01-07 edit: the video of her presentation is now online.)

A mostly complete list of slides is available via the OSF.  CNI recorded many of the presentations and have begun uploading the videos to the CNI Youtube channel.  The CNI Spring 2016 Membership Meeting will be held in San Antonio, TX, April 4-5, 2016.

Given all the simultaneous sessions, your CNI experience was probably different than mine.  Check out these other CNI Fall 2015 trip reports: Dale Askey, Jaap Geraerts, and Tim Pyatt

--Michael

Tuesday, December 8, 2015

2015-12-08: Evaluating the Temporal Coherence of Composite Mementos

When an archived web page is viewed using the Wayback Machine, the archival datetime is easy to determine from the URI and the Wayback Machine's display.  The archival datetime of embedded resources (images, CSS, etc.) is another story.  And what stories their archival datetimes can tell.  These stories are the topic of my recent research and Hypertext 2015 publication.  This post introduces composite mementos, the evaluation of their temporal (in-)coherence, provides an overview of my research results.

 

What is a composite memento?

 

A Memento is an archived copy of web resource (RFC 7089)  The datetime when the copy was archived is called its Memento-Datetime.  A composite memento is a root resource such as an HTML web page and all of the embedded resources (images, CSS, etc.) required for a complete presentation.  Composite mementos can be thought of as a tree structure.  The root resource embeds other resources, which may themselves embed resources, etc.  The figure below shows this tree structure and a composite memento of the ODU Computer Science home page as archived by the Internet Archive on 2005-05-14 01:36:08 GMT.  Or does it?


 

Hints of Temporal Incoherence

 

Consider the following weather report that was captured 2004-12-09 19:09:26 GMT.  The Memento-Datetime can be found in the URI and the December 9, 2004 capture date is clearly visible near the upper right.  Look closely at description of Current Conditions and the radar image.  How can there be no clouds on the radar when the current conditions are light drizzle?  Something is wrong here.  We have encountered temporal incoherence.  This particular incoherence is caused by inherent delays of the capture process used by Heritrix and other crawler-based web archives.  In this case, the radar image was captured much later (9 months!) than the web page itself.  However, there is no indication of this condition.



 

A Framework for Evaluating Temporal Coherence


In order to study temporal coherence of composite mementos, a framework was needed.  The framework details a series of patterns describing the relationships between root and embedded mementos and four coherence states.  The four states and sample patterns are described below.  The technical report describing the framework is available on arXiv.

 

Prima Facie Coherent

An embedded memento is prima facie coherent when evidence shows that it existed in its archived state at the time the root was captured.  The figure below illustrates the most common case.  Here the embedded memento was captured after the root but modified before the root.  The importance of Last-Modified is discussed in my previous post on the importance of header replay.


 

Possibly Coherent

An embedded memento is possibly coherent when evidence shows that it might have existed in its archived state at the time the root was captured.  The figure below illustrates this case.  Here the embedded memento was captured before the root.


 

Probably Violative

An embedded memento is probably violative when evidence shows that it might not have existed in its archived state at the time the root was captured.  The figure below illustrates this case.  Here the embedded memento was captured after the root, but its Last-Modified datetime is unknown.


Prima Facie Violative

An embedded memento is probably violative when evidence shows that it did not exist in its archived state at the time the root was captured.  The figure below illustrates this case.  Here the embedded memento was captured after the root and was also modified after the root.

 

 

 

Only One in Five Archived Web Pages Existed as Presented


Using the framework, we evaluated the temporal coherence of 82,425 composite mementos. These contained 1,623,127 embedded URIs, of which 1,332,993 were available in a web archive.  Composite mementos were recomposed using single and multiple archives and two heuristics: minimum distance and bracket.

Single and multiple archives: Composite mementos were recomposed from single and multiple archives. For single archives, all embedded mementos were selected from the same archive as the root. For multiple archives, embedded mementos were selected from any of the 15 archives included in the study.

Heuristics:  The minimum distance (or nearest) heuristic selects between multiple captures for the same URI by choosing the memento with the Memento-Datetime nearest to the root's Memento-Datetime, and can be either before or after the root's. The bracket heuristic also takes Last-Modified datetime into account. When a memento's Last-Modified datetime and Memento-Datetime "bracket" the root's Memento-Datetime (as in Prima Facie Coherent above), it is selected even if it is not the closest.

We found that only 38.7% of web pages are temporally coherent and that only 17.9% (roughly 1 in 5) of web pages are temporally coherent and can be fully recomposed (i.e., they have no missing resources).

The paper can be downloaded from the ACM Digital Library or from my local copy.  The slides from the Hypertext'15 talk follow.




One last thing: I would like to thank Ted Goranson for presenting the slides at Hypertext 2015 when we could not attend.

-- Scott G. Ainsworth

Friday, November 27, 2015

2015-11-28: Two WS-DL Classes Offered for Spring 2016

https://xkcd.com/1319/

Two WS-DL classes are offered for Spring 2016:

Information Visualization is being offered both online (CRNs 29183 (HR), 29184 (VA), 29185 (US)) and on-campus (CRN 25511).  Web Science is being offered for the first time with the 432/532 numbers (CRNs 27556 and 27557, respectively), but the class will be similar to the Fall 2014 offering as 495/595

--Michael

Tuesday, November 24, 2015

2015-11-24 Twitter Follower Analysis of Virginia University Alumni Associations

The primary goal of any alumni association is to maintain and strengthen the ties between its alumni, the community, and the mission of the university. With social media, it's easier than ever to connect with current and former graduates on Facebook, Instagram or Twitter with a simple invitation to "like us" or "follow me." Considering just one of these social platforms, we recently analyzed the Twitter networks of twenty-three (23) Virginia colleges and universities to determine what, if any, social characteristics were shared among the institutions and whether we could gain any insight by examining the public profiles of their respective followers. The colleges of interest, ranked by number of followers in Table 1, vary in size, mission, type of institution, admissions selectivity and perceived prestige. Each of the alumni associations has maintained a Twitter presence for an average of six (6) years. The oldest Twitter account belongs to Roanoke College (@roanokecollege) which is approaching the eight (8) year mark. The newest Twitter account was registered by Randolph Macon College (@RMCalums) nearly two years ago.




University Followers Joined Twitter
University of Virginia 12,100 11/1/2008
Roanoke College* 9,588 3/1/2008
Regent University* 7,966 11/1/2008
James Madison University 7,865 8/1/2008
Virginia Tech 6,418 4/1/2009
College of William & Mary 4,448 1/1/2009
University of Mary Washington 3,847 10/1/2009
Liberty University 3,699 11/6/2009
University of Richmond 3,299 5/1/2009
Sweet Briar College* 2,523 8/1/2010
George Mason University 2,375 2/1/2011
Hampton University 2,372 2/15/2012
Christopher Newport University 2,191 8/1/2010
Old Dominion University 1,996 7/1/2009
Randolph College* 1,857 8/1/2008
Washington and Lee University 1,842 8/1/2011
Radford University 1,758 3/11/2011
Hampden-Sydney College 1,086 7/1/2009
Longwood University 1,035 2/28/2013
Hollins University 923 4/1/2009
Virginia Military Institute 836 3/1/2009
Norfolk State University 629 8/15/2011
Randolph-Macon College 172 3/7/2014
Table 1 - Alumni Associations Ranked by Followers

* Institution does not have an official alumni Twitter account.
The university Twitter account was used instead.

Social Graph Analysis


NodeXL is a template for Microsoft Excel which makes network analysis easy and rather intuitive. We used this tool for data collection to import the Twitter networks and to analyze the various social media interactions. There are limitations established in the Twitter API which regulate the amount of data collected per hour by any one user. Therefore, due to rate limiting, NodeXL will inherently only import the 2,000 most recent friends and followers for any Twitter account. To improve the response time of the API, we further restricted our data collection to the 200 most recent tweets for both the university and each of its follower accounts.

For our first look at the alumni associations, we clustered the data based on an algorithm in NodeXL which looks at how the vertices are connected to one another. The clusters, as shown in Figure 1, are indicated by the color of the nodes. The clusters themselves revealed some interesting patterns.  The high level of inter-association connectivity, as measured in follows, tweets and mentions, was unexpected. We would have thought that each association operated within the confines of its own Twitter space or that of its parent organization. As we examine the groupings in this network, it is not unreasonable that we would observe connections between Old Dominion University (@ODUAlumni), Norfolk State University (@nsu_alumni_1935) and Hampton University (@HamptonU_Alumni) as all three are located within close proximity of one another in the Hampton Roads area. But, then we must take notice of Hollins University (@HollinsAlum), a small, private women's college in Roanoke, VA, which has a connection with ten (10) other alumni associations; more connections than any other school. Hollins is one of the smallest universities in our group with enrollment of less than 800 students. Since Twitter is primarily about influence, in this instance, we can probably assume the follows serve as a means to observe best practices and current engagement trends employed by larger institutions. While Hollins University is well connected, as are many of the other schools, at the opposite end of the spectrum we find Liberty University (@LibertyUAlum), a large school with more than 77,000 students. Liberty University remains totally isolated with no follower connections to the other alumni associations. You might minimally expect some type of connection with either Regent University (@RegentU) since both share a similar mission as private, Christian institutions or other universities within close physical proximity such as Randolph College (@randolphcollege).

Figure 1 - Connectivity of Alumni Associations

Twitter Followers, Enrollment, and Selectivity


We normally measure the popularity of a Twitter account based on the number of followers. Instead of simply quantifying the follower counts of each alumni association, we sought to understand if certain factors, actions or inherent qualities about the institution might influence the relative number of followers.  First, we considered whether more active tweeters would attract more alumni followers. As shown in Figure 2, the College of William and Mary (@wmalumni) has generated the most tweets over its lifetime, approximately 6,200 or 2.5 tweets per day. But, we also observe the University of Mary Washington (@UMaryWash), which has approximately half the student enrollment, a similar Twitter life span, 50% percent less tweets at 2,800 or 1.3 per day, with only a slight difference in the number of followers, 4,400 versus 3,800 respectively. While the graph shows that schools such as Virginia Tech (@vt_alumni) and the University of Virginia (@UVA_Alumni) have more followers with fewer lifetime tweets, the caveat is that these public institutions have the benefit of considerably larger student populations which inherently increases the pool of potential alumni.

Figure 2 - Lifetime Tweets Versus Followers


Next, we considered whether a higher graduation rate, or alumni production, would result in more followers. We obtained the most recent, 2014 overall graduation rates for each institution from the National Center for Education Statistics, with reported overall six-year graduation rates ranging from 34% to 94%. A 2015 Pew Research Center study of the Demographics of Social Media Users indicates that among all internet users, 32% in the 18 to 29 age range use Twitter. This is a key demographic as we would expect our alumni associations to be primarily focused on attracting recent undergraduates. We also factored in selectivity, a comparative scoring of the admissions process, using the categories defined in the 2016 U.S. News Best Colleges Directory. In this directory, colleges are designated as most selective, more selective, selective, less selective or least selective based on a formula.

As we look at Figure 3, we observe a positive correlation between admissions selectivity and the institution's overall graduation rate. Schools which were least selective during the admissions phase also produced the lowest graduation rates (less than 40%) while schools which were most selective, experienced the highest graduation rates (around 90%).  It isn't surprising that improved graduation rates positively affect the expected number of alumni Twitter followers. We'll leave it as an exercise for the reader to extrapolate how closely each institution's annual undergraduate enrollment, graduation rate and expected level of engagement on Twitter corresponds to the actual number of followers when all three factors are considered.

Figure 3 - Followers Versus Graduation Rate

Potential Reach of Verified Followers


Users on Twitter want to be followed so we looked carefully at who, besides alumni and students, was following each of the alumni associations. Specifically, we noted the number of Twitter verified followers; accounts which are usually associated with high-profile users in "music, acting, fashion, government, politics, religion, journalism, media, sports, business and other key interest areas." In addition to an abundance of local news reporters and sports anchors, regional politicians and career sites, other notable followers included: restaurant review site Zagat (@Zagat), automaker Toyota USA (@toyota), musician and rapper DJ King Assassin (@DjKingAssassin), the Nelson Mandela Foundation (@NelsonMandela), the President of the United States Barack Obama (@BarackObama), Virginia Governor Terry McAuliffe (@GovernorVA) and artist and singer Yoko Ono (@yokoono). It's a safe assumption that some of the follower relationships with verified users were probably established prior to 2013. This is the year in which Twitter instituted new rules to kill the "auto follow" which was a programmatic way of following another user back after they follow you. Either way, the open question would remain as to why these particular users would follow an alumni association when there are no readily apparent educational ties.

Twitter doesn't take follower count into consideration when verifying an account, but it's not unusual for a verified account to have a considerable following. Since the mission of an alumni association is essentially about networking and information dissemination, we also measured the potential reach or level of influence across the followers' extended network obtained from the verified accounts. No single university had more than 70 verified accounts among its followers. However, when we look at their contribution, in Figure 4, as a percentage of the combined reach achieved by all followers of each alumni association, these select users accounted for as little as 1.6% for Virginia Military Institute (@vmialumni) to as much as 95.8% for Longwood University (@acaptainforlife) of the institution's total potential reach (i.e., followers of my followers).

Figure 4 - Potential Reach Percentage of Verified Accounts

Alumni Sentiment


Finally, we examined how each follower described himself in the description (i.e., bio) portion of their Twitter profile by extracting the top 200 most frequently occurring terms for each alumni association. A word cloud for the alumni of each university is shown in Figure 5. If we further isolated the descriptions to the top ten most frequently occurring words, we observed a common pattern among all alumni followers. In addition to the official or some derivative of the institution name (e.g., JMU, NSU, Tech), we find the terms love, life, and some intimate description of the follower as a mom, husband, student, father or alumni.  If the university has an athletic department, we also found mention of sports and, in the case of our two Christian universities, Liberty and Regent, the terms God, Jesus, and Christ were prevalent. In 22 of 23 institutions, the alumni primarily described themselves using these personal terms. Conversely, the alumni followers at only one institution, the University of Richmond (@urspidernetwork), described themselves in a more business-like or academic manner with more frequent mention of the words PhD, career, and job.



Figure 5 - Word Clouds of Twitter Follower Descriptions



-- Corren McCoy

Thursday, November 5, 2015

2015-11-06: iPRES2015 Trip Report

From November 2nd through November 5th, Dr. Nelson, Dr. Weigle, and I attended the iPRES2015 conference at the University of North Carolina Chapel Hill. This served as a return visit for Drs. Nelson and Weigle; Dr. Nelson worked at UNC through a NASA fellowship and Dr. Weigle received her PhD from UNC. We also met with Martin Klein, a WS-DL alumnus now at the UCLA Library. While the last ODU contingent to visit UNC was not so lucky, we returned to Norfolk relatively unscathed.

Cal Lee and Helen Tibbo opened the conference with a welcome on November 3rd, followed by Nancy McGovern's keynote address delivered with Leo Konstantelos and Maureen Pennock. This was not a traditional keynote, but instead an interactive dialogue in which several challenge areas were presented to the audience, and the audience responded -- live and on twitter -- significant achievements or advances in those challenge areas from #lastyear. For example, Dr. Nelson identified the #iCanHazMemento utility. The responses are available on Google Docs.


I attended the Institutional Opportunities and Challenges session to open the conference. Kresimir Duretec presented "Benchmarks for Digital Preservation Tools." His presentation touched on how we can get digital preservation tools that "Just Work", including benchmarks for evaluating tools on test beds and measuring them for quality. Related to this is Mat Kelly's work on the Archival Acid Test.



Alex Thirifays presented "Towards a Common Approach for Access to Digital Archival Records in Europe." This paper touched on user access: user needs, best practices for identifying requirements for access, and a capability gaps analysis of current tools versus user needs.

"Developing a Highly Automated Web Archive System Based
on IIPC Open Source Software" was presented by Zhenxin Wu. Her paper outlined a framework of open source tools to archive the web using Heritrix and a SOLR index of WARCS with an enhanced interface.

Barbara Sierman closed the session with her presentation "Best Until ... A National Infrastructure for Digital Preservation in the Netherlands" focusing on user accessibility and organizational challenges as part of a national strategy for preserving digital and cultural Dutch heritage.

After lunch, I lead off the Infrastructure Opportunities and Challenges session with my paper on Archiving Deferred Representations Using a Two-Tiered Crawling Approach. We defined deferred representations as those that rely on JavaScript to load embedded resources on the client. We show that archives can use PhantomJS to create a 1.5 times larger crawl frontier than Heritrix itself, but PhantomJS crawls 10.5 times slower. We recommend using a classifier to recognize deferred representations and only use it to crawl deferred representations, mitigating the crawl slow-down while still reaping the benefits of the headless crawler.

 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling Approach from Justin Brunelle
  
Douglas Thain followed with his presentation on "Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness?" Similar to our work with deferred representations, his work focuses on scientific replay of simulations and software experiments. He presents several tools as part of a framework for preserving the context of simulations and simulation software, including dependencies and build information.

Hao Xu presented "A Method for the Systematic Generation of Audit Logs in a Digital Preservation Environment and Its Experimental Implementation In a Production Ready System". His presentation focuses on a construction of a finite state machine to understand whether a repository is following compliance policies for auditing purposes.

Jessica Trelogan and Lauren Jackson presented their paper Preserving an Evolving Collection: "“On-The-Fly” Solutions for the Chora of Metaponto Publication Series." They discussed the storage of complex artifacts of ongoing research projects in archeology with the intent of improving sharability of the collections.

To wrap up Day 1, we attended a panel on Preserving Born-Digital News consisting of Edward McCain, Hannah Sommers, Christie Moffatt, Abigail Potter (moderator), St├ęphane Reecht, and Martin Klein. Christie Moffatt identified the challenges with archiving born-digital news material, including the challenges with scoping a corpus. She presented their case study on the Ebola response. St├ęphane Reecht presented the work by the BnF regarding their work to perform massive, once-a-year crawls as well as selective, targeted daily crawls. Hannah Sommers provided insight into the culture of a news producer (NPR) on digital preservation. Martin Klein presented SoLoGlo (social, local, and global) news preservation, including citing statistics about the preservation of links shortened by the LA Times. Finally, Edward McCain discussed the ephemeral nature of born-digital news media, and provided examples of the sparse number of mementos in news pages in the Wayback Machine.


To kick off Day 2, Lisa Nakamura gave her opening keynote The Digital Afterlives of This Bridge Called My Back: Public Feminism and Open Access. Her talk focused on the role of Tumblr in curating and sharing a book no longer in print as a way to open the dialogue on the role of piracy and curation in the "wild" to support open access and preservation.

I attended the Dimensions of Digital Preservation session, which began with Liz Lyon's presentation on "Applying Translational Principles to Data Science Curriculum Development." Her paper outlines a study to help revise the University of Pittsburgh's data science curriculum. Nora Mattern took over the presentation to discuss the expectations of the job market to identify the skills required to be a professional data scientist.

Elizabeth Yakel presented "Educational Records of Practice: Preservation and Access Concerns." Her presentation outlined the unique challenges with preserving, curating, and making available educational data. Education researchers or educators can use these resources to further their education, reuse materials, and teach the next generation of teachers.

Emily Maemura presented "A Survey of Organizational Assessment Frameworks in Digital Preservation." She presented the results of a survey focusing on frameworks for assessment models, drawing conclusions like software maturity models do for computer scientists. Further, her paper identifies trends, gaps, and models for assessment.

Matt Schultz, Katherine Skinner, and Aaron Trehub presented "Getting to the Bottom Line: 20 Digital Preservation Cost Questions." Their questions help institutions evaluate cost, including questions about storage fees, support, business plans, etc. to help institutions assess their approach to taking on digital preservation.

After lunch, I attended the panel on Long Term Preservation Strategies & Architecture: Views from Implementers consisting of Mary Molinaro (moderator), Katherine Skinner, Sibyl Schaefer, Dave Pcolar, and Sam Meister. Sibyl Schaefer lead off with a presentation of details on Chronopolis and ACE audit manager. Dave Pcolar followed by presenting the Digital Preservation Network (DPN) and their data replication policies for dark archives. Sam Meister discussed the BitCurator Consortium which helps with the acquisition, appraisal, arrangement and descriptions, and access of archived material. Finally, Katherine Skinner presented the MetaArchive Cooperative and their activities teaching institutions to perform their own archiving, along with other statistics (e.g., the minimum number of copies to keep stuff safe is 5).

Day 2 concluded with the poster session (including a poster by Martin Klein) and reception.



Pam Samuelson opened Day 3 with her keynote Mass Digitization of Cultural Heritage: Can Copyright Obstacles Be Overcome? Her keynote touched on the challenges with preserving cultural heritage introduced by copyright, along with some of the emerging techniques to overcome the challenges. She identified duration of copyright as a major contributor to the challenges of cultural preservation. She notes that most countries have exceptions for libraries and archives for preservation purposes, and explains recent U.S. evolutions in fair use through the Google Books rulings.

After Samuelson's keynote, I concluded my iPRES2015 visit and explored Chapel Hill, including a visit to the Old Well (at the top of this post) and an impromptu demo of the pit simulation. It was very scary.



Several themes emerged from iPRES2015, including an increased emphasis on web archiving and a need to improved context, provenance, and access for digitally preserved resources. I look forward to monitoring the progress in these areas.


--Justin F. Brunelle