Sunday, April 23, 2017

2017-04-23: Remembering Professor Hussein Abdel-Wahab

Hussein (blue shirt) at the post-defense feast for Dr. SalahEldeen.
As we head into exam week, I can't help but reflect that this is the first exam week at ODU since 1980 that does not involve Professor Hussein Abdel-Wahab.  The department itself was established in 1979, so Hussein has been here nearly since the beginning.  For comparison, in 1980 I was in middle school. 

I had the privilege of knowing Hussein both as my instructor for three classes in 1996 & 1997, and as a colleague since 2002.  None who knew Hussein would dispute that he was an excellent instructor with unrivaled concern for students' learning and general well-being. It is fitting that ODU is establishing the Dr. Abdel-Wahab Memorial Scholarship (http://bit.ly/HusseinAbdelWahabODU) which will support graduate students.  As of April 11, the scholarship is 58% of the way to its goal of $25k.  I've donated, and I call on all former students and colleagues to continue Hussein's legacy and ensure this scholarship is fully funded.


--Michael

Thursday, April 20, 2017

2017-04-20: Trusted Timestamping of Mementos


The Memento Protocol provides a Memento-Datetime header indicating at what datetime a memento was captured by a given web archive. In most cases, this metadata sufficiently informs the user of when the given web resource existed. Even though it has been established in US courts that web archives can be used to legally establish that a given web resource existed at a given time, there is still potential to doubt this timestamp because the same web archive that provides the memento also provides its Memento-Datetime. Though not a replacement for Memento-Datetime, trusted timestamping is the process that provides certainty of timestamps for content and can be used to provide additional data to alleviate this doubt.
In this post, I examine different trusted timestamping methods. I start with some of the more traditional methods before discussing OriginStamp, a solution by Gipp, Meuschke, and Gernandt that uses the Bitcoin blockchain for timestamping.

Brief Cryptography Background


Trusted timestamping systems use some concepts from cryptography for confidentiality and integrity. I will provide a brief overview of these concepts here.

Throughout this document I will use the verb hashing to refer to the use of a one-way collision-resistant hash function. Users supply content, such as a document, as input and the hash function provides a digest as output.  Hash functions are one-way, meaning that no one can take that digest and reconstitute the document. Hash functions are collision-resistant, meaning that there is a very low probability that another input will produce the same digest, referred to as a collision. As shown in the figure below, small changes to the input of a hash function produce completely different digests. Thus, hash digests provide a way to identify content without revealing it. The output of the hash function is also referred to as a hash.

This diagram shows the digests produced with the same cryptographic hash function over several inputs. A small change in the input produces a completely different hash digest as output. Source: Wikipedia
The timestamping solutions in this post use the SHA-256 and RIPEMD-160 hash functions. SHA-256 is a version of the SHA-2 algorithm with 256 bit keys. Its predecessor, SHA-1, has been under scrutiny for some time. In 2005, cryptographers showed mathematically that SHA-1 was not collision-free, prompting many to start moving to SHA-2. In February of 2017, Google researchers were able to create a collision with SHA-1, showing that SHA-1 is no longer reliably trustworthy. Because collision attacks are theoretically possible, though technically infeasible, for SHA-2, SHA-3 has been developed as a future replacement. I mention this to show how the world of hash functions is dynamic, resulting in continued research of better functions. For this post, however, it is most important to just understand the purpose of hash functions.

In addition to hash functions, this post discusses solutions that utilize pubic-key cryptography, consisting of private keys and public keys. Users typically generate a private key using random information generated by their computer and an algorithm such as 3DES. Users then use this private key with an algorithm such as RSA or ECC to generate a public key. Users are expected to secure their private key, but share the public key.

A diagram showing an example of encryption using public and private keys. Source: Wikipedia
Users use public keys to encrypt content and private keys to decrypt it. In the figure above, everyone has access to Alice's pubic key. Bob encrypts a message using Alice's public key, but only Alice can decrypt it because she is the only one with access to her private key.

This process can be used in reverse to digitally sign content. The private key can be used to encrypt content and the public key can be used to decrypt it. This digital signature allows anyone with access to the public key to verify that the content was encrypted by the owner of the private key because only the owner of the private key should have access to the private key.

Certificates are documents containing a public key and a digital signature. A user typically requests a certificate on behalf of themselves or a server. A trusted certificate authority verifies the user's identity and issues the certificate with a digital signature. Other users can verify the identity of the owner of the certificate by verifying the digital signature of the certificate with the certificate authority. Certificates expire after some time and must be renewed. If a user's private key is compromised, then the certificate authority can revoke the associated certificate.

A nonce is a single-use value that is added to data prior to encryption or hashing. Systems often insert it to ensure that transmitted encrypted data can not be reused by an attacker in the future. In this article nonces are used with hashing as part of Bitcoin's proof-of-work function, to be explained later.

Finally, there is the related concept of binary-to-text encoding. Encoding allows a system to convert data to printable text. Unlike hash digests, encoded text can be converted back into its original input data. Cryptographic systems typically use encoding to create human-readable versions of public/private keys and hash digests. Base64 is a popular encoding scheme used on the Internet. Bitcoin also uses the lesser known Base58 scheme.

Brief Bitcoin Background


Bitcoin is a cryptocurrency. It is not issued by an institution or backed by quantities of physical objects, like gold or silver. It is software that was released with an open source license to the world by an anonymous individual using the pseudonym Satoshi Nakamoto. Using a complex peer-to-peer network protocol it ensures that funds (Bitcoins) are securely transferred from one account to another.
Bitcoin accounts are identified by addressesAddresses are used to indicate where Bitcoins should be sent (paid). The end user’s Bitcoin software uses public and private keys to generate an address. Users often have many addresses to ensure their privacy. Users have special purpose software, called Wallets, that generates and keeps track of addresses. There is no central authority to issue addresses, meaning that addresses must be generated individually by all participants.
Wallets generate Bitcoin addresses using the following process:
  1. Generate an ECC public-private key pair
  2. Perform SHA-256 hash of public key
  3. Perform RIPEMD-160 hash of that result
  4. Add version byte (0x00) to the front of that result
  5. Perform a SHA-256 hash of that result, twice
  6. Append the first 4 bytes of that result to the value from #4
  7. Convert that result into Base58, which eliminates confusing characters 0 and O as well as 1 and l
The last step uses Base58 so that users can write the address on a piece of paper or speak it aloud over the phone. The ECC algorithms are used by Bitcoin to make the public-private key pair "somewhat resistant" to quantum computers. SHA-256 is used twice in step 5 to reduce the chance of success for any as yet unknown attacks against the SHA-2 hash function. Because all Bitcoin users generate addresses themselves, without a central addressing authority, this long process exists to reduce the probability of a collision between addresses to 0.01%. Even so, for improved security, the community suggests generating new addresses for each transaction. Note that only public-private keys and hashing are involved. There are no certificates to revoke or expire.

Transactions contain the following types of entries:
  • Transaction inputs contain a list of addresses and amount of Bitcoins to transfer from those addresses. Also included is a digital signature corresponding to each address. This digital signature is used by the Bitcoin software to verify that the transaction is legitimate and thus these Bitcoins can be spent. There is also a user-generated script used to specify how to access the bitcoins, but the workings of these scripts are outside the scope of this post.
  • Transaction outputs contain a list of addresses and amount of Bitcoins to transfer to those addresses. As with transaction inputs, a user-generated script is included to specify how to spend the bitcoins, but I will not go into further detail here.
  • Another field exists to enter the amount of transaction fees paid to the miners for processing the transaction.
The Bitcoin system broadcasts new transactions to all nodes. Miners select transactions and group them into blocks. A block contains the transactions, a timestamp, a nonce, and a hash of the previous block.

Within each block, Bitcoin stores transactions in a Merkle tree, an example diagram of which is shown below. Transactions reside in the leaves of the tree. Each non-leaf node contains a hash of its children. This data structure is used to prevent corrupt or illicit transactions from being shared, and thus included in the block chain.
A diagram showing an example of a Merkle tree. Each non-leaf node contains a hash of its children. For Bitcoin, transactions reside in the leaves. Source: Wikipedia
A conceptual diagram shows the Bitcoin blockchain. Each block contains: a hash of the previous block, a timestamp, a nonce, and the root of a tree of transactions. Source: Wikipedia
Miners only see Bitcoin addresses and amounts in each transaction, providing some privacy to those submitting transactions. To add a block to the blockchain, miners must solve a proof-of-work function. Once a block has been assembled by a miner, the Bitcoin software generates a nonce, adds it to the content of the block, and then hashes the contents of the block with the nonce using SHA-256, twice, to produce a hash. The system does not share the nonce with the miners. To add their block to the Bitcoin blockchain, the miner must guess nonces, combine them with the block content, and hash this content until they produce the correct hash digest value. This proof-of-work function is designed to be fast for the system to verify, but time-consuming for the miners to execute. The length of the nonce is increased every 14 days to maintain the level of difficulty in solving the proof-of-work function. This value was chosen to ensure that that miners continue to take 10 minutes to process each block. For each block completed, miners are rewarded any user-provided transaction fees included in the transactions as well as newly minted Bitcoins -- a block reward. The block reward is currently set at 12.5 bitcoins, worth $15,939 as of March 2, 2017. Miners run software and dedicated hardware around the globe to solve the proof-of-work function. Currently the local cost of electricity is the limiting factor in profiting from mining bitcoins.
To alter previous transactions, an attacker would need to select the block of the transaction they wished to alter. They would then need to insert their illicit transaction into the block and create a new block. After this they would then need to solve the block containing that transaction and all subsequent blocks faster than more than 50% of all the other miners, thus it is considered to be extremely hard to alter the blockchain.

Bitcoins do not really exist, even on an individual's hard drive. The blockchain contains the record of every bitcoin spent and indicates the current balance at each Bitcoin address. Full Bitcoin nodes have a copy of the blockchain, currently at 105GB, which can create problems for users running full nodes. Satoshi Nakamoto recommended periodically pruning the blockchain of old transactions, but so far this has not been done.

Technology exists to create blockchains outside of Bitcoin, but Bitcoin provides incentives for participation, in terms of monetary rewards. Any system attempting to use a blockchain outside of Bitcoin would need to produce similar incentives for participants. The participation by the miners also secures the blockchain by preventing malicious users from spamming it with transactions.

How accurate are the timestamps in the blockchain? According to the Bitcoin wiki:
A timestamp is accepted as valid if it is greater than the median timestamp of previous 11 blocks, and less than the network-adjusted time + 2 hours. "Network-adjusted time" is the median of the timestamps returned by all nodes connected to you. As a result, block timestamps are not exactly accurate, and they do not even need to be in order. Block times are accurate only to within an hour or two.
Bitcoins come in several denominations. The largest is the Bitcoin. The smallest is the satoshi. One satoshi equals 0.00000001 (1 x 10-8) Bitcoins.

Trusted Timestamping

Trusted Timestamping allows a verifier to determine that the given content existed during the time of the timestamp. It does not indicate time of creation. In many ways, it is like Memento-Datetime because it is still an observation of the document at a given point in time.
Timestamping can be performed by anyone with access to a document. For a timestamp to be defensible, however, it must be produced by a reliable and trusted source. For example, timestamps can be generated for a document by a user's personal computer and then signed with digital signatures. At some point in the future, a verifier can check that the digital signature is correct and verify the timestamp. This timestamp is not trustworthy because the clock on the personal computer may be altered or set incorrectly, thus providing doubt in the accuracy of the timestamp. Some trustworthy party must exist to that not only sets their time correctly, but ensures that timestamps are verifiable in the future.
Trusted Timestamping relies upon a trustworthy authority to accept data, typically a document, from a requestor and issue timestamps for future verification. The process then allows a verifier that has access to the timestamp and the original data to verify that the data existed at that point in time. Thus, two basic overall processes exist: (1) timestamp issue, and (2) timestamp verification.
In addition, privacy is a concern for documents. A document being transmitted can be intercepted and if the document is held by some third party for the purposes of verifying a timestamp in the future, then it is possible that the document can be stolen from the third party.  It is also possible for such a document to become corrupted. To address privacy concerns, trusted timestamping focuses on providing a timestamp for the hash of the content instead. Because such hashes cannot be reversed, the document cannot be reconstructed. Owners of the document, however, can generate the hashes from the document to verify it with the timestamping system.
Finally, verifying the timestamps should not depend on some ephemeral service. If such a service is nonexistent in the future, then the timestamps cannot be verified. Any timestamping solution will need to ensure that verification can be done for the foreseeable future.

Trusted Timestamping with RFC 3161 and ANSI X9.95


ANSI X9.95 extends RFC 3161 to provide standards for trusted timestamping in the form of a third party service called a Time Stamping Authority (TSA). Both standards discuss the formatting of request and response messages used to communicate with a TSA as well as indicating what requirements a TSA should meet.
The TSA issues time-stamp tokens (TST) as supporting evidence that the given content existed prior to a specific datetime. The following process allows the requestor to acquire a given timestamp:
  1. The requestor creates a hash of the content.
  2. The requestor submits this hash to the TSA.
  3. The TSA ensures that its clock is synchronized with an authoritative time source.
  4. The TSA ensures that the hash is the correct length, but, to ensure privacy, does not examine the hash in any other way.
  5. The TSA generates a TST containing the hash of the document, the timestamp, and a digital signature of these two pieces of data. The digital signature is signed with a private key whose sole purpose is timestamping. RFC 3161 requires that the requestor not be identified in the TST. The TST may also include additional metadata, such as the security policy used.
  6. The TST is sent back to the requestor, who should then store it along with the original document for future verification.

Simplified diagram showing the process of using a Time Stamp Authority (TSA) to issue and verify timestamps. Source: Wikipedia
To verify a timestamp, a verifier does not need the TSA. The verifier only needs:
  • the hash of the original document
  • the TST
  • the TSA's certificate
They use the original data and the TST in the following process:
  1. The verifier verifies the digital signature of the TST against the TSA’s certificate. If this is correct, then they know that the TST was issued by the TSA.
  2. The verifier then checks that the hash in the TST matches the hash of the document. If they match, then the verifier knows that the document hash was used to generate that TST.
  3. The timestamp contained in the TST and the hash were used in the generation of the digital signature, hence the TSA observed the document at the given time.
Hauber and Stornetta mentioned that the TSA can be compromised in their 1991 paper "How to Time-Stamp a Digital Document" and prescribed a few solutions, such as linked timestamping, which is implemented by ANSI X9.95. With Linked Timestamping, each TST includes a hash digest of the previous TST. Users can then additionally verify that a timestamp is legitimate by comparing this hash digest with the previously granted TST.

ANSI X9.95 also supports the use of transient-key cryptography. In this case, the system generates a distinct public-private key pair for each timestamp issued. Once a timestamp is issued and digitally signed, the system deletes the private key so that it cannot be compromised. The verifier uses the public key to verify the digital signature.

Services using these standards exist with companies like DigiStampeMudhraTecxoft, and Safe Stamper TSA. Up to 5 free timestamps can be generated per day per IP at Safe Creative's TSA.

The solutions above have different issues.

ANSI X9.95 and RFC 3161 provide additional guidance on certificate management and security to ensure that the TSA is not easily compromised, but the TSA is still the single point of failure in this scheme. If the TSA relies on an incorrect time source or is otherwise compromised, then all timestamps generated are invalid. If the TSA’s certificate expires or is revoked, then verifying past timestamps becomes difficult if not impossible, depending on the availability of the now invalid certificate. If the revoked certificate is still available, the datetime of revocation can be used as an upper bound for the validity of any timestamps. Unfortunately, a certificate is usually revoked because its private key was compromised. A compromised key creates doubt in any timestamps issued using it. If transient-key cryptographic is used, doubt exists with any generated public-private keys as well as their associated timestamps.

Linked timestamping helps ensure that the TSA's tokens are not easily faked, but require that the verifier meet with other verifiers to review the tokens. This requirement violates the need for privacy.

Hauber and Stornetta developed the idea of distributed trust for providing timestamps. The system relies on many clients being ready, available, and synchronized to a time source. Requestors would submit a document hash digest to a random set of k timestamping clients. These clients would in turn each digitally sign their timestamp response.  Because the choice in clients is random, there is a low probability of malicious clients issuing bad timestamps. The requestor would then store all timestamps from the k clients who responded. Unfortunately, this system requires participation without direct incentives.

Trusted Timestamping with OriginStamp


Gipp, Meuschke, and Gernandt recognized that the cryptocurrency Bitcoin provides timestamping as part of maintaining the blockchain. Each block contains a hash of the previous block, implementing something similar to the linking concept developed by Hauber and Stornetta and used in ANSI X9.95. The blockchain is distributed among all full Bitcoin clients and updated by miners, who only see transactions and cannot modify them. In some ways, the distributed nature of Bitcoin resembles parts of Hauber and Stornetta's distributed trust. Finally, the blockchain, because it is distributed to all clients, is an independent authority able to verify timestamps of transactions, much like a TSA, but without the certificate and compromise issues.
They created the OriginStamp system for timestamping user-submitted documents with the Bitcoin blockchain. They chose Bitcoin because it is the most widely used cryptocurrency and thus is perceived to last for a long time. This longevity is a requirement for verification of timestamps in the future.

OriginStamp Process to convert a document content into a Bitcoin address for use in a Bitcoin transaction that can be later verified against the blockchain.
The figure above displays the OriginStamp process for creating a Bitcoin address from a document:
  1. A user submits a document to the system which is then hashed, or just submits the hash of a document.
  2. The submitted document hash is placed into a list of hashes -- seed text -- from other submissions during that day.
  3. Once per day, this seed text is itself hashed using SHA-256 to produce an aggregate hash.
  4. This aggregate hash is used as the Bitcoin private key which is used to generate a public key.
  5. That public key is used to generate a Bitcoin address which can be used in a timestamped Bitcoin transaction of 1 satoshi.
OriginStamp could submit each document hash to the blockchain as an individual transaction, but the hashes are aggregated together to keep operating costs low. Because fees are taken out of every Bitcoin transaction, each transaction costs $ 0.03, allowing Gipp and his team to offer this low cost service for free. They estimate that the system costs $10/year to operate.

Their paper was published in March of 2015. According to coindesk.com, for the March 2015 time period 1 Bitcoin was equal to $268.32. As of March of 2017, 1 Bitcoin is now equal to $960.36. The average transaction fee now sits at approximately 45,200 satoshis, resulting in a transaction fee of $0.43, as of March 26, 2017.

A screenshot of the memento that I timestamp throughout this section.

OriginStamp allows one to submit documents for timestamping using the Bitcoin blockchain. In this case, I submitted the HTML content of the memento shown in the figure above.

OriginStamp responds to the submission by indicating that it will submit a group of hashes to the Bitcoin blockchain in a few hours.

With the OriginStamp service, the requestor acquires a timestamp using the following process:
  1. Submit the document -- or just its hash -- to the OriginStamp website as seen in the screenshots above. If a document is submitted, its hash is calculated and the document is not retained.
  2. OriginStamp sends an email once the system has submitted the concatenated hashes to the Bitcoin blockchain. This email will contain information about the seed text used and this seed  must be used for verification.
  3. In addition, the @originstamp Twitter account will tweet that the given hash was submitted to the blockchain.
A screenshot showing how OriginStamp displays verification information for a given hash. In this case, the document hash is da5328049647343c31e0e62d3886d6a21edb28406ede08a845adeb96d5e8bf50 and it was submitted to the blockchain on 4/10/2017 10:18:29 AM GMT-0600 (MDT) as part of seed text whose hash, and hence private key is c634bcafba86df8313332abc0ae854eea9083b279cdd4d9cde1d516ee6fb70d9.
Because the blockchain is expected to last for the forseeable future and is tamper-proof, it can be used at any time to verify the timestamp. There are two methods available: with the OriginStamp service, or directly against the Bitcoin blockchain using the seed text.

To do so with the OriginStamp service, the verifier can follow this process:

  1. Using the OriginStamp web site, the verifier can submit the hash of the original document and will receive a response as shown in the screenshot above. The response contains the timestamp under the heading "Submitted to Bitcoin network".
  2. If the verifier wishes to find the timestamp in the blockchain, they can expand the "Show transaction details" section of this page, shown below. This section reveals a button allowing one to download the list of hashes (seed text) used in the transaction, the private and public keys used in the transaction, the recipient Bitcoin address, and a link to blockchain.info also allowing verification of the transaction at a specific time.
  3. Using the link "Verify the generated secret on http://blockchain.info", they can see the details of the transaction and verify the timestamp, shown in the figure below.
A screenshot showing that more information is available once the user clicks "show transaction details". The recipient Bitcoin address is outlined in red. From this screen, the user can download the seed text containing the list of hashes submitted to the Bitcoin blockchain. A SHA-256 hash of this seed text is the Bitcoin private key. From this private key, a user can generate the public key and eventually the Bitcoin address for verification. In this case the generated Bitcoin address is 1EcnftDQwawHQWhE67zxEHSLUEoXKZbasy.

A screenshot of the blockchain.info web site showing the details of a Bitcoin transaction, complete with its timestamp. The Bitcoin address and timestamp have been enlarged and outlined in red. For Bitcoin address 1EcnftDQwawHQWhE67zxEHSLUEoXKZbasy, 1 Satoshi was transferred on 2017-02-28 00:01:15, thus that transaction date is the timestamp for the corresponding document.
To verify that a document was timestamped by directly checking the Bitcoin blockchain, one only needs:
  • The hash of the original document.
  • The seed text containing the list of hashes submitted to the bitcoin network.
  • Tools necessary to generate a Bitcoin address from a private key and also search the contents of the blockchain.
If OriginStamp is not available for verification, the verifier would then follow this process:
  1. Generate the hash of the document.
  2. Verify that this hash is in the seed text. This seed text should have been saved as a result of the email or tweet from OriginStamp.
  3. Hash the seed text with SHA256 to produce the Bitcoin private key.
  4. Generate the Bitcoin address using a tool such as bitaddress.org. The figure below shows the use of bitaddress.org to generate a Bitcoin address using a private key.
  5. Search the Bitcoin blockchain for this address, using a service such as blockchain.info. The block timestamp is the timestamp of the document.
A screenshot of bitaddress.org's "Wallet Details" tab with the Bitcoin address enlarged and outlined in red. One can insert a Bitcoin private key and it will generate the pubic key and associated Bitcoin address. This example uses the private key of c634bcafba86df8313332abc0ae854eea9083b279cdd4d9cde1d516ee6fb70d9 shown in previous screenshots which corresponds to a Bitcoin address of 1EcnftDQwawHQWhE67zxEHSLUEoXKZbasy, also shown in previous figures.
OriginStamp also supplies an API that developers can use to submit documents for timestamping as well as verify timestamps and download the seed text for a given document hash.

Comparison of Timestamping Solutions


In the table below I provide a summary comparison between the TSA, Origin Stamp, and submitting directly to the blockchain without OriginStamp.

TSA OriginStamp Directly To Blockchain
Financial Cost per Timestamp Dependent on Service and subscription, ranges from $3 to $0.024 Dependent on size of seed text, but less than Bitcoin transaction fee Bitcoin transaction fee, optimally $0.56
Accuracy of Timestamp Within seconds of time of request, but dependent on number of requests in queue if linked timestamping used Within 1 Day + 2 hours Within 2 hours
Items Needed For Verification Original Document

TST

Certificate of server to verify signature
Original Document

The seed text of hashes submitted at the same time
Original Document
Tools Needed For Verification Certificate verification tools Software to generate Bitcoin Address

Software to search blockchain
Timestamp Storage In TST saved by requestor Blockchain
Privacy Only hash of document is submitted, but TSA knows of the requestor's IP address Miners only see the Bitcoin address, not who submitted the document or even its hash
Targets of Compromise TSA time server

TSA certificate private key
Blockchain
Requirement for Compromise Server is configured insecurely

Server has open software vulnerabilities
>50% of Bitcoin miners colluding
Dependency of Longevity Life of Organization Offering Timestamping Service Continued Interest In Preserving the Blockchain

In the first row, we compare the cost of timestamps from each service. At the TSA service run by Digistamp, an archivist can obtain a cost of $0.024 per timestamp for 600,000 timestamps, but would need to commit to an 1 year fee of $14,400. They would also need to acquire all 600,000 timestamps within a year or lose them. If they pay $10, they are only allocated 30 timestamps and need to use them within 3 months, resulting in a cost of $3 per timestamp. Tecxoft's pricing is similar. OriginStamp attempts to keep costs down by bundling many hashes into a seed text file, but is still at the mercy of Bitcoin's transaction fees. The price of Bitcoin is currently very volatile. The transaction fee mentioned in Gipp's work from 2015 was $0.03. Miners tend to process a block faster if it has a higher transaction fee. The optimal fee has gone up due to Bitcoin's rise in value and an increase in the number of transactions waiting to be processed. The lowest price for the least delay is currently 200 satoshis per byte and the median transaction size is 226 bytes for a total cost of 45,200 satoshis.  This was equivalent to $0.43 when I started writing this blog post and is now $0.56.

In the second row, we compare timestamp accuracy. The TSA should be able to issue a timestamp to the requestor within seconds of the request. This can be delayed by a long queue if the TSA uses linked timestamping because every request must be satisfied in order. OriginStamp, however, tries to keep costs down by submitting its seed list to the blockchain at once-a-day intervals, according to the paper. On top of this, the timestamp in the blockchain is accurate to within two hours of submission of the Bitcoin transaction. This means that an OriginStamp timestamp may be as much as 24 hours + 2 hours = 26 hours off from the time of submission of the document hash. In practice, I do not know the schedule used by OriginStamp, as I submitted a document on February 28, 2017 and it was not submitted to the Bitcoin network until March 4, 2017. Then again, a document submitted on March 19, 2017 was submitted to the Bitcoin network by OriginStamp almost 18 hours later.
If the cost is deemed necessary, this lack of precision can be alleviated by not using OriginStamp but submitting to the blockchain directly. One could generate a Bitcoin address from a single document hash and then submit it to the blockchain immediately. The timestamp precision would still be within 2 hours of transaction submission.
For the TSA, timestamps are stored in the TST, which must be saved by the requestor for future verification. In contrast, OriginStamp saves timestamps in the Blockchain. OriginStamp users still need to save the seed list, so both solutions require the requestor to retain something along with the original document for future verification.

All solutions offer privacy through the use of document hashes. The Bitcoin miners receiving OriginStamp transactions only see the Bitcoin address generated from the hash of the seed list and do not even know it came from OriginStamp, hiding the original document submission in additional layers. The TSA, on the other hand, is aware of the requestor's IP address and potentially other identifying information.
To verify the timestamp, TSA users must have access to the original document, the TST, and the certificate of the TSA to verify the digital signature. OriginStamp only requires the original document and the seed list of hashes submitted to the blockchain. This means that OriginStamp requires slightly fewer items to be retained.
If using the blockchain directly, without OriginStamp, a single document hash could be used as the private key. There would be no seed list in this case. For verification, one would merely need the original document, which would be retained anyway.
To compromise the timestamps, the TSA time server must be attacked. This can be done by taking advantage of software vulnerabilities or insecure configurations. TSAs are usually audited to prevent insecure configurations, but vulnerabilities are frequently discovered in software. OriginStamp, on the other hand, requires that the blockchain be attacked directly, which is only possible if more than 50% of Bitcoin miners collude to manipulate blocks.
Finally, each service has different vulnerabilities when it comes to longevity. Mementos belong to web archives, and as such, are intended to exist far, far longer than 20 years. This makes longevity a key concern in any decision of a timestamping solution. The average lifespan of a company is now less than 20 years and expected to decrease. The certificates for verifying a timestamp running a TSA may last for little more than 3 years. This means that the verifier will need someone to have saved the TSA's certificate prior to verification of the timestamp. If the organization with the document and the TST is also the same organization providing the TSA's certificate, then there is cause for doubt in its validity because that organization can potentially forge any or all of these verification components.
The Bitcoin blockchain, on the other hand, is not tied to any single organization and is expected to last as long as there is interest in investing in the cryptocurrency. In addition, there are many copies of the blockchain available in the world. If Bitcoin goes away, there is still an interest in maintaining the blockchain for verification of transactions, and thus retaining copies of the blockchain by many parties. If someone wishes to forge a timestamp, they would need to construct their own illicit blockchain. Even if they went that far, a copy of their blockchain can be compared to other existing copies to evaluate its validity. Thus, even if the blockchain is no longer updated, it is still an independent source of information that can be used for future verification. If the blockchain is ever pruned, then prior copies will still need to be archived somewhere for verification of prior transactions. The combined interests of all of these parties support the concept of the Bitcoin blockchain lasting longer than a single server certificate or company.

So, with trusted timestamping available, what options do we have to make it easy to verify memento timestamps?

Trusted Timestamping of Mementos


It is worth noting that, due to delays between timestamp request and response in each of these solutions, trusted timestamping is not a replacement for Memento-Datetime. The Memento-Datetime header provides a timestamp of when the web archive captured a given resource. Trusted timestamping, on the other hand, can provide an additional dimension of certainty that a resource existed at a given datetime. Just as Memento-Datetime applies to a resource at a specific URI-M, so would a trusted timestamp.

A crawler can capture a memento as part of its normal operations, compute the hash of its content, and then submit this hash for timestamping to one of these services. The original memento content encountered during the crawl, the raw memento, must be preserved by the archive indefinitely. The memento can include a link relation in the response headers, such as the unregistered trusted-timestamp-info relation shown in the example headers below, indicating where one can find additional information to verify the timestamp.

HTTP/1.1 200 OK Server: Tengine/2.1.0 Date: Thu, 21 Jul 2016 17:34:15 GMT Content-Type: text/html;charset=utf-8 Content-Length: 109672 Connection: keep-alive Memento-Datetime: Thu, 21 Jul 2016 15:25:44 GMT Link: <http://www.cnn.com/>; rel="original", <http://a.example.org/web/timemap/link/http://www.cnn.com/>; rel="timemap"; type="application/link-format", <http://a.example.org/web/http://www.cnn.com/>; rel="timegate", <http://a.example.org/web/20160721152544/http://www.cnn.com/>; rel="last memento"; datetime="Thu, 21 Jul 2016 15:25:44 GMT", <http://a.example.org/web/20160120080735/http://www.cnn.com/>; rel="first memento"; datetime="Wed, 20 Jan 2016 08:07:35 GMT", <http://a.example.org/web/20160721143259/http://www.cnn.com/>; rel="prev memento"; datetime="Thu, 21 Jul 2016 14:32:59 GMT", <http://a.example.org/timestamping/20160722191106/http://www.cnn.com/>; rel="trusted-timestamp-info"

The URI linked by the timestamp-info relation could be a JSON-formatted resource providing information for verification. For example, if OriginStamp is used for timestamping, then the resource might look like this:

{ "timestamping-method": "OriginStamp", "seed-text": "http://a.example.org/timestamping/20160722191106/seedtext.txt",
"hash-algorithm": "SHA-256" }

In this case, a verifier already knows the URI-M of the memento. They only need to locate the raw memento, calculate its hash, and use the seed text as described above to generate the Bitcoin address and find the timestamp in the Bitcoin blockchain.

Or, if an RFC 3161 solution is used, the timestamp-info resource could look like this:

{ "timestamping-method": "RFC 3161", "timestamp-token": "http://a.example.org/timestamping/tst/20160721174431/http://www.cnn.com", "tsa-certificate": "http://a.example.org/timestamping/tsacert/cert.pem",
"hash-algorithm": "SHA-256" }

In this case, a verifier can locate the raw memento, calculate its hash, and use verify it using the timestamp token (TST) and the TSA certificate as described above for RFC 3161.

If it is known that the crawler creates the hash of the raw memento and uses it as a private key for generating a Bitcoin address, thus submitting it directly to the blockchain for timestamping, then no additional headers would be needed. Verifiers only need the content of the raw memento to generate the hash. In addition, perhaps a separate timestamping service could exist for mementos, using the URI-M (e.g., https://timestamping-service.example.org/{URI-M}).

If one specific timestamping scheme is used, then perhaps specific link relations can be created to convey the resources from each of these fields.

Of course, this assumes that we only want to timestamp raw mementos. Conceivably, one might wish to timestamp a screenshot of a web page, or its WARC. We will need to perform additional analysis of the other potential use cases needed.

Summary


In this post, I have discussed different timestamping options. These options have different levels of complexity and security. All of them support privacy by a permitting the submission of a document hash to a timestamping service. OriginStamp attempts to address some of the concerns of the existing ANSI X9.95/RFC 3161 standard by using the timestamping features of the Bitcoin blockchain.

Of these options, the Bitcoin blockchain offers a decentralized, secure solution that supports privacy and does not depend on a central server that can fail or be compromised. Because copies of the blockchain are distributed to all full Bitcoin clients, it remains present for verification in the future. Bitcoin has been around for 8 years and continues to increase in value. Because all participants have incentives to keep the blockchain distributed and up to date, it is expected to outlast most companies, who have a median age of 20 years. In addition, if Bitcoin is no longer used, copies of the blockchain will still need to be maintained indefinitely for verification. It does, however, suffer from issues with timestamp accuracy inherent in the Bitcoin protocol. These can be alleviated by submitting a document hash directly against the blockchain.

Companies offering trusted timestamping using TSAs, on the other hand, may not have the longevity and require subscription fees for a limited number of timestamps. Though Bitcoin is currently volatile, it has stabilized before, and the subscription fees from these companies are still more expensive on average than the Bitcoin transaction fee.

Even though timestamping options exist, use cases must be identified for the verification of such timestamps in the future. These use cases will inform requestors of which content to be timestamped and will also affect which timestamping solution is selected. It would also be beneficial for verifiers to have links to additional resources for verification.

Trusted timestamping of mementos is possible, but will require some additional decisions and technology to become a reality.

Additional References Used For This Post:

Tuesday, April 18, 2017

2017-04-18: Local Memory Project - going global

Screenshots of world local newspapers from the Local Memory Project's local news repository. Top: newspapers from Iraq, Nigeria, and France. Bottom: Chile, US (Alaska), and Australia.
Soon after the introduction of the Local Memory Project (LMP) and the local news repository of:
  • 5,992 US Newspapers
  • 1,061 US TV stations, and
  • 2,539 US Radio stations
I considered extending the local news collection beyond US local media to include newspapers from around the world.
Finding and generating the world local newspaper dataset
After a sustained search, I narrowed my list of potential sources of world local news media to the following in order of my perceived usefulness:
From this list, I chose Paperboy as my world local news source because it was fairly structured (makes web scraping easier), and contained the cities in which the various newspaper organizations are located. Following scraping and data cleanup, I extracted local newspaper information for:
  • 6,638 Newspapers from 
  • 3,151 Cities in 
  • 183 Countries
The dataset is publicly available.
Integrating the world local newspaper dataset into LMP
For a seamless transition from US to a world-centric Local Memory Project, it was pertinent to ensure the world local media was represented with exactly the same data schema as the US local media. This guarantees that the architecture of LMP remains the same. For example, the following response excerpt represents a single US college newspaper (Harvard Crimson). 
{
  "city": "Cambridge", 
  "city-latitude": 42.379146, 
  "city-longitude": -71.12803, 
  "collection": [
   {
      "city-county-lat": 42.377, 
      "city-county-long": -71.1167, 
      "city-county-name": "Harvard", 
      "country": "USA", 
      "facebook": "http://www.facebook.com/TheHarvardCrimson", 
      "media-class": "newspaper", 
      "media-subclass": "college", 
      "miles": 0.6, 
      "name": "Harvard Crimson", 
      "open-search": [], 
      "rss": [], 
      "state": "MA", 
      "twitter": "http://www.twitter.com/thecrimson", 
      "video": "https://www.youtube.com/user/TheHarvardCrimson/videos", 
      "website": "http://www.thecrimson.com/" 
   }
  ], 
  "country": "USA", 
  "self": "http://www.localmemory.org/api/countries/USA/02138/10/?off=tv%20radio%20", 
  "state": "MA", 
  "timestamp": "2017-04-17T18:56:10Z"
 }
Similarly, world local media use this same schema for seamless integration into the existing LMP framework. However, different countries have different administrative subdivisions. From an implementation standpoint, it would have been ideal if all countries had the US-style administrative subdivision of: Country - State - City, but this is not the case. Also, currently, LMP's Geo and LMP's Local Stories Collection Generator are accessed using a zip code. Consequently, the addition of world local news media meant finding the various databases which mapped zip codes to their respective geographical locations. To overcome the obstacles of multiple administrative subdivisions, and the difficulty of finding comprehensive databases that mapped zip codes to geographical locations, while maintaining the pre-existing LMP data schema, I created a new access method for Non-US local media. Specifically, US local news media are accessed with a zip code (which maps to a City in a State), while Non-US local news media are accessed with the name of the City. For example, here is a list of 100 local newspapers that serve Toronto, Canada: http://www.localmemory.org/geo/#Canada/Toronto/100/

The addition of 6,638 Non-US newspapers from 183 countries makes it possible not only to see local news media from different countries, but also to build collections of stories about events from the perspectives of local media around the world.

--Nwala

Monday, April 17, 2017

2017-04-17: Personal Digital Archiving 2017

On March 29-30, 2017 I attended Personal Digital Archiving Conference 2017 (#pda2017) held at Stanford University in sunny Palo Alto, California. Other members of the Web Science and Digital Libraries Research Group (WS-DL) had previously attended this conference (see their 2013, 2012, and 2011 trip reports) and from their rave reviews of previous year's conferences, I was looking forward to it. I also just happened to be presenting and demoing the Web Archiving Integration Layer (WAIL) there as an added bonus.

Day 1

Day one started off at 9am with Gary Wolf giving the first keynote on Quantified Self Archives. Quantified Self Archives are comprised of data generated from health monitoring tools such as the FitBit or life blogging data which is used to gain in sites into your own life through data visualization. 
After the keynote was the first session Research Horizons moderated by WS-DL alumni, Yasmina Anwar.
The first talk of this session was Whose Life Is It, Anyway? Photos, Algorithms, and Memory (Nancy Van House, UC Berkeley). In the talk, Van House spoke on the effects of "faceless" algorithms on images and how they can distort the memory of the images they are applied to in many personal archives. Van House also spoke about how machine learning techniques when done in aggregate on images without context can have unintended consequences, especially when attempting to detect emotion. To demonstrate this, Van House showed a set of images tagged with the emotion of Joy one of which was a picture of an avatar from the online life simulator Second Life.

The second talk was Digital Workflow and Archiving in the Humanities and Social Sciences (Smiljana Antonijevic Ubois, Penn State University). Ubois spoke on the many ways scholars use non-traditional archives such as Dropbox or photos taken by their smartphones to preserve their work. One of the biggest points brought up in the talk by Ubois was that humanities and social sciences scholars still see the web as a resource rather than home to a digital archive.

The third talk was Mementos Mori: Saving the Legacy of Older Performers (Joan Jeffri, Research Center for Arts & Culture/The Actors Fund). In the talk, Jeffri spoke on the efforts being made to document and preserve the works of artists by the performing arts legacy project. The project found that one in five living artists in New York had no documentation of their work especially the older artists.
The final talk in the session was Exploring Personal Financial Information Management Among Young Adults (Robert Douglas Ferguson, McGill School of Information Studies). Douglas spoke on the passive preservation i.e usage of web portal and tools provided by financial services, done by young adults when it comes to managing their money and the need to consider long-term preservation of these materials.
Session two was Preserving & Serving PDA at Memory Institutions moderated by Glynn Edwards.
This session started off with Second-Generation Digital Archives: What We Learned from the Salman Rushdie Project (Dorothy Waugh and Elizabeth Russey Roke, Emory University). In 2010, Emory University announced the launch of the Salman Rushdie Digital Archives. This reading room kiosk offered researchers at the Manuscript, Archives, and Rare Book Library the opportunity to explore born-digital material from one of four of Rushdie’s personal computers through dual access systems. One of the biggest lessons learned noted by Waugh was the need to document everything the software engineers do as their work is just as ephemeral as the born digital information they wished to preserve.
After Waugh was Composing an Archive: the personal digital archives of contemporary composers in New Zealand (Jessica Moran, National Library of New Zealand). In recent years the Library has acquired the digital archives of a number of prominent contemporary composers. Moran discussed the personal digital archiving practices of the composer, the composition of the archive, and the work of the digital archivists, in collaboration with curators, arrangement and description librarians, and audio-visual conservators, to collect, describe, and preserve this collection.
The final talk in session two was Learning from users of personal digital archives at the British Library (Rachel Foss, The British Library). Foss discussed the efforts made by the British Library to provide access to their digital collections that require emulation to viewed. Foss disscused that arhiving professionals also need to consider how we assist and educate our researchers to make use of born-digital collections implying understanding more about how they want to interrogate these collections as a resource.

Lunch happened. Session 3 Teaching PDA moderated by Charles Ransom.
Journalism Archive Management (JAM): Preparing journalism students to manage their personal digital assets and diffuse JAM best practices into the media industry (Dorothy Carner & Edward McCain, University of Missouri). In collaboration with MU Libraries and the school’s Donald W. Reynolds Journalism Institute, a personal digital archive learning model was developed and deployed in order to prepare journalism-school students, faculty and staff for their ongoing information storage and access needs. The MU J-School has created a set of PDA best practices for journalists and branded it: Journalism Archive Management (JAM).
An archivist in the lab with a codebook: Using archival theory and “classic” detective skills to encourage reuse of personal data (Carly Dearborn, Purdue University Libraries). Dearborn designed a workshop inspired by the Society of Georgia Archivists’ personal digital archiving activities to introduced attendees to archival concepts and techniques which can be applied to familiarize researchers with new data structures.
Session 4: Emergent Technologies & PDA 1 moderated by Nicholas Taylor
Cogifo Ergo Sum: GifCities & Personal Archives on the Web (Maria Praetzellis & Jefferson Bailey, Internet Archive). In the talk Praetzellis and Bailey spoke on the gif archive GifCities created for the Internet Archives 20th anniversary which included a search interface. The GeoCities Animated GIF Search Engine, comprising over 4.6 million animated GIFs from the GeoCities web archive. Each GIF links back to the archived GeoCities web page upon which it was originally embedded. The search engine offers a novel, flabbergasting window into what is likely one of the largest aggregations of publicly-accessible archival personal documentary collections. It also provokes a reassessment of how we conceptualize personal archives as being both from the web (as historical encapsulations) and of the web (as networked recontextualization).
Comparison of Aggregate Tools for Archiving Social Media (Melody Condron). In the talk Condron spoke about many tools which could make archiving social media easier: Frostbox, If This Then That and digi.me. Of all the tools mentioned If This Then That provided the easiest way for its users to push social media into archives such Internet Archive or Webrecorder.

Video games collectors and archivists: how might private archives influence archival practices (Adam Lefloic Lebel, University of Montreal)

Demonstrations:
There were two different demonstration sessions the first was between session 4&5 and the second was at the end after session 6.
The demo for the Web Archiving Integration Layer (WAIL) consisted of two videos and myself talking to those who stopped by about the particular use cases of WAIL or answering any questoins they had about WAIL. The first is viewable below which is detailed feature walkthrough of WAIL and the second was showing off WAIL in action.
Session 5: Emergent Technologies & PDA 2 moderated by Henry Lowood

CiteTool: Leveraging Software Collections for Historical Research (Eric Kaltman, UC Santa Cruz) Kaltman spoke about how the tool is currently being used in a historical exploration of the computer game DOOM as a way to compare conditions across versions and to save key locations for future historical work. Since the tool provides links to saved locations, it is also possible to share states amongst researchers in collaborative environments. The links also function as an executable citation in cases where an argument about a program’s functionality is under discussion and would benefit from first-hand execution.


Applying technology of Scientific Open Data to Personal Closed Data (Jean-Yves Le Meur, CERN) Le Meur explained the methodology and technologies developed (partly at CERN) to preserve scientific data (like High Energy Physics) could be re-used for Personal restricted data. Existing initiatives to collect and preserve for very long term the personal data from individuals will first be reviewed, as well as a few examples of well established collective memory portals. Solutions implemented for Open data in HEP will then be compared, looking at the guiding principles and underlying technologies. Finally, a proposal to foster a solid shared platform for closed Personal Data Archive will be drafted on the model of Open Scientific Data Archives.


Personal Data and the Personal Archive (Chelsea Gunn, University of Pittsburgh) Gunn questioned if quantified self and lifelogging application are forms of personal data as a part of our personal archives, or do they constitute a form of ephemera, useful for the purposes of tracking progress toward a goal, but not of long-term interest?

Using Markdown for PDA Interoperability (Jay Datema, Stony Brook University). The only thing you can count on with born-digital projects is that you will have to migrate the content at some point. But having done digital library development for over a decade, I'd like to talk about simple text, and a problem that has a proven solution. Markdown is an intermediate step between text and HTML. If you're writing anything that requires an HTML link, its shortcuts are worth learning. Most web applications rely on the humble submit button. Once text goes in, it becomes part of a database backend. To extract it, it may require a set of database calls, or parsing a SQL file, or hoping that someone wrote a module to let you download what you entered.

Session 6 PDA The Arts moderated by Kate Tasker

From Virtual to Reality: Dissecting Jennifer Steinkamp’s Software-Based Installation (Shu-Wen Lin, New York University) Lin spoke about time-based and digital art combines media and technology that challenges traditional conservation practices while requiring dedicated care from working with Steinkamp’s animated installation Botanic that was exhibited in Times Square Arts: Midnight Moment. Lin's talk focused on the internal structure and relationship between the software used which was Maya, After Effects, scripts, and final deliverables. Lin also spoke about provide a risk assessment that will enable museum professionals as well as the artist herself to identify sustainability and compatibility of digital elements in order to build a documentation that can collect and preserve the whole spectrum of digital objects related to the piece.

The PDAs of Others: Completeness, Confidentiality, and Creepiness in the Archives of Living Subjects (Glen Worthey, Stanford University) The title and inspiration for Worthey's presentation came from the 2006 German film Das Leben der Anderen, which dramatized the covert monitoring of East Germans. Although the biography was "authorized", Worthy spoke on how the process of gathering and documenting materials often reveals tensions between completeness and a respect for privacy; between on-the-record and off-the-record conversations; between the personal and the professional; between the probing of important questions and voyeuristic-seeming observation of the subject's complex inner life.

RuschaView 2.0 (Stace Maple, Stanford University) In 1964, LA Painter, Ed Ruscha put a Nikon Camera in the back of his truck, drove up and down Sunset Strip and shot what would become a continuous panorama of "Every Building on the Sunset Strip" (1966). Maples talk highlighted both Ruscha's multi-decade project, as well as Maple's multi-month attempt to create the metadata required to reproduce something like Ruscha's "Every Building..." publication, in a digital context.

(Pete Schreiner, NCSU) Between 2003-2013 an associated group of independent rocks bands from Bloomington, Indiana shared a tour van. When the owner, a librarian, was preparing to move across the country in 2014, Pete Schreiner, band member and proto-librarian decided to preserve this esoteric collection of local music-related history. Subsequently, as time allowed, he created an online collection of the photographs using Omeka. This case study presents a guerrilla archiving project, issues encountered throughout the process, and attempts to find the balance between professional archiving principles and getting it done.

Day 2

Due to request of a presenter(s) who did not want their slides material recorded/show too others beyound the attendies no photos were taken
Session 7 Documenting Cultures Communities moderated by Michael Olson
(Anna Trammell, University of Illinois) Trammell's talk discussed the experience gained from forming relationships and building trust with the student organizations at the University of Illinois, capturing and processing their digital content, and utilizing these records in instruction and outreach.

(Jennifer Douglas, University of British Columbia) Online grieving and intimate archives: a cyberethnographic approach (Jennifer Douglas, University of British Columbia) Douglas presented a short paper discussing the archiving practices of the community of parents grieving stillborn children. In that paper, Douglas demonstrated how these communities functioned as aspirational archives, not only preserving the past, but creating a space in the world for their deceased children. Regarding the ethics of online research and archiving, Douglas' paper introduced the methodology of cyberethnography and explored its potential connections to the work of digital archivists.

(Barbara Jenkins, University of Oregon) In the talk Jenkins spoke on the development of an Afghanistan personal archives project which was created in 2012 and was able to expand its scope through a short sabbatical supported by the University of Oregon in 2016. The Afghanistan collection Jenkins was able to build combines over 4,000 slides, prints, negatives, letters, maps, oral histories, and primary documents.
Session 8 Narratives Biases Pda Social Justice moderated by Kim Christen

Andrea Pritchett, co-founder of Berkeley Copwatch, Robin Margolis, UCLA MLIS in Media Archives, and Ina Kelleher presented a proposed design for a digital archive aggregating different sources of documentation toward the goal of tracking individual officers. Copwatch chapters operate from a framework of citizen documentation of the police as a practice of community-driven accountability and de-escalation.

Stacy Wood, PhD candidate in Information Studies at UCLA, discussed the ways in which personal records and citizen documentation are embedded within techno-socio-political infrastructural arrangements and how society can reframe these technologies as mechanisms and narratives of resistance.
Session 9 PDA And Memory moderated by Wendy Hagenmaier

Interconnectedness: personal memory-making on YouTube (Leisa Gibbons, Kent State University) Gibbons spoke about the use of YouTube as a personal memory-making space and research questions concerning what conceptual, practical and ethical role institutions of memory have in online participatory spaces and how personal use of online technologies can be preserved as evidence.

(Sudheendra Hangal & Abhilasha Kumar, Ashoka University) This talk was about Cognitive Experiments with Life-Logs (CELL) and how it is a scalable new approach to measure recall of personally familiar names using computerized text-based analysis of email archives. Regression analyses revealed that accuracy in familiar name recall declined with the age of the email, but increased with greater frequency of interaction with the person. Based on those findings, Hangal and Kumar believe that CELL can be applied as an ecologically valid web-based measure to study name retrieval using existing digital life-logs among large populations.

(Frances Corry, University of Southern California) Corry spoke about the built-in feature on most smartphones, tablets, and computers today, and how these tool enables users to “photograph” what rests on the surface of their screens. These “photographs” rather screenshots were presented as a valuable tool worthy of further attention in digital archival contexts.
Session 10 Engaging Communities In PDA 1 moderated by Martin Gengenbach
Introducing a Mobile App for Uploading Family Treasures to Public Library Collections (Natalie Milbrodt, Queens Public Library) The Queens Public Library in New York City has developed a free mobile application for uploading scanned items, digital photos, oral history interviews and “wild sound” recordings of Queens neighborhoods for permanent safekeeping in the library’s archival collections. It allows families to add their personal histories to the larger historical narrative of their city and their country. The tool is part of the programmatic and technological offerings of the library’s Queens Memory program, whose mission is to capture contemporary history in Queens.

(Russell Martin, District of Columbia Public Library) The Memory Lab (Russell Martin, District of Columbia Public Library) The Memory Lab at District of Columbia Public Library is a do-it-yourself personal archiving space where members of the public can digitize outdated forms of media, such as VHS, VHS-C, mini DVs, audio cassettes, photos, slides, negatives and floppy disks. Martin's presentation consists of how the Memory Lab was developed by a fellow from the Library of Congress' National Digital Stewardship Residency, budget for the lab, equipment used and how it is put together, training for staff and the public, as well as success stories and lessons learned.

(Wendy Hagenmaier, Georgia Tech) Hagenmaier's presentation outlined the user research process of the retroTECH team to inform the design of the carts, offer an overview of the carts’ features and use cases, and reflected on where retroTECH’s personal digital archiving services are headed. retroTECH aims to inspire a cultural mindset that emphasizes the importance of personal archives, open access to digital heritage, and long-term thinking.

The Great Migration (Jasmyn Castro, Smithsonian NMAAHC) Castro presented the ongoing film preservation efforts at the Smithsonian for the African American community and how the museum invite visitors to bring their home movies into the museum and have them inspected and digitally scanned by NMAAHC staff.
Session 11 Engaging Communities In Pda 2 moderated by Mary Kidd
Citizen archive and extended MyData principles (Mikko Lampi, Mikkeli University of Applied Sciences) Lampi spoke about how Digitalia – Research Center on Digital Information Management – is developing a professional-quality digital archiving solution available for common people. The Citizen archive relies on an open-source platform allowing users to manage their personal data and ensure access to it on a long-term basis. MyData paradigm is connected with personal archiving by managing coherent descriptive metadata and access rights, while also ensuring privacy and usefulness.

Born Digital 2016: Collecting for the Future (Sarah Slade, State Library Victoria) Slade presented Born Digital 2016: collecting for the future a week-long national media and communications campaign to raise public awareness of digital archiving and preservation and why it matters to individuals, communities and organizations. The campaign successfully engaged traditional television and print media, and online news outlets, to increase public awareness of what digital archiving and preservation is and why it is important.

Whose History? (Katrina Vandeven, MLIS Candidate, University of Denver) Vandeven discussed the macro appraisal and documenting intersectionality within the Women's March on Washington Archives Project, where it went wrong, possible solutions to documenting intersectionality in activism, and introduced the Documenting Denver Activism Archives Project.
Bring Personal Digital Archiving 2017 to a close was Session 12 PDA Retrospect And Prospect Panel moderated by Cathy Marshall

Howard Besser, Clifford Lynch and Jeff Ubois discussed how early observers and practitioners of personal digital archiving will look back on the last decade, and forward to the next, covering changing social norms about what is saved, why, who can view it, and how; legal structures, intellectual property rights, and digital executorships; institutional practices, particularly in library and academic settings, but also in the form of new services to the public; market offerings from both established and emerging companies; and technological developments that will allow (or limit) the practice of personal archiving.
- John