Thursday, April 20, 2017

2017-04-20: Trusted Timestamping of Mementos


The Memento Protocol provides a Memento-Datetime header indicating at what datetime a memento was captured by a given web archive. In most cases, this metadata sufficiently informs the user of when the given web resource existed. Even though it has been established in US courts that web archives can be used to legally establish that a given web resource existed at a given time, there is still potential to doubt this timestamp because the same web archive that provides the memento also provides its Memento-Datetime. Though not a replacement for Memento-Datetime, trusted timestamping is the process that provides certainty of timestamps for content and can be used to provide additional data to alleviate this doubt.
In this post, I examine different trusted timestamping methods. I start with some of the more traditional methods before discussing OriginStamp, a solution by Gipp, Meuschke, and Gernandt that uses the Bitcoin blockchain for timestamping.

Brief Cryptography Background


Trusted timestamping systems use some concepts from cryptography for confidentiality and integrity. I will provide a brief overview of these concepts here.

Throughout this document I will use the verb hashing to refer to the use of a one-way collision-resistant hash function. Users supply content, such as a document, as input and the hash function provides a digest as output.  Hash functions are one-way, meaning that no one can take that digest and reconstitute the document. Hash functions are collision-resistant, meaning that there is a very low probability that another input will produce the same digest, referred to as a collision. As shown in the figure below, small changes to the input of a hash function produce completely different digests. Thus, hash digests provide a way to identify content without revealing it. The output of the hash function is also referred to as a hash.

This diagram shows the digests produced with the same cryptographic hash function over several inputs. A small change in the input produces a completely different hash digest as output. Source: Wikipedia
The timestamping solutions in this post use the SHA-256 and RIPEMD-160 hash functions. SHA-256 is a version of the SHA-2 algorithm with 256 bit keys. Its predecessor, SHA-1, has been under scrutiny for some time. In 2005, cryptographers showed mathematically that SHA-1 was not collision-free, prompting many to start moving to SHA-2. In February of 2017, Google researchers were able to create a collision with SHA-1, showing that SHA-1 is no longer reliably trustworthy. Because collision attacks are theoretically possible, though technically infeasible, for SHA-2, SHA-3 has been developed as a future replacement. I mention this to show how the world of hash functions is dynamic, resulting in continued research of better functions. For this post, however, it is most important to just understand the purpose of hash functions.

In addition to hash functions, this post discusses solutions that utilize pubic-key cryptography, consisting of private keys and public keys. Users typically generate a private key using random information generated by their computer and an algorithm such as 3DES. Users then use this private key with an algorithm such as RSA or ECC to generate a public key. Users are expected to secure their private key, but share the public key.

A diagram showing an example of encryption using public and private keys. Source: Wikipedia
Users use public keys to encrypt content and private keys to decrypt it. In the figure above, everyone has access to Alice's pubic key. Bob encrypts a message using Alice's public key, but only Alice can decrypt it because she is the only one with access to her private key.

This process can be used in reverse to digitally sign content. The private key can be used to encrypt content and the public key can be used to decrypt it. This digital signature allows anyone with access to the public key to verify that the content was encrypted by the owner of the private key because only the owner of the private key should have access to the private key.

Certificates are documents containing a public key and a digital signature. A user typically requests a certificate on behalf of themselves or a server. A trusted certificate authority verifies the user's identity and issues the certificate with a digital signature. Other users can verify the identity of the owner of the certificate by verifying the digital signature of the certificate with the certificate authority. Certificates expire after some time and must be renewed. If a user's private key is compromised, then the certificate authority can revoke the associated certificate.

A nonce is a single-use value that is added to data prior to encryption or hashing. Systems often insert it to ensure that transmitted encrypted data can not be reused by an attacker in the future. In this article nonces are used with hashing as part of Bitcoin's proof-of-work function, to be explained later.

Finally, there is the related concept of binary-to-text encoding. Encoding allows a system to convert data to printable text. Unlike hash digests, encoded text can be converted back into its original input data. Cryptographic systems typically use encoding to create human-readable versions of public/private keys and hash digests. Base64 is a popular encoding scheme used on the Internet. Bitcoin also uses the lesser known Base58 scheme.

Brief Bitcoin Background


Bitcoin is a cryptocurrency. It is not issued by an institution or backed by quantities of physical objects, like gold or silver. It is software that was released with an open source license to the world by an anonymous individual using the pseudonym Satoshi Nakamoto. Using a complex peer-to-peer network protocol it ensures that funds (Bitcoins) are securely transferred from one account to another.
Bitcoin accounts are identified by addressesAddresses are used to indicate where Bitcoins should be sent (paid). The end user’s Bitcoin software uses public and private keys to generate an address. Users often have many addresses to ensure their privacy. Users have special purpose software, called Wallets, that generates and keeps track of addresses. There is no central authority to issue addresses, meaning that addresses must be generated individually by all participants.
Wallets generate Bitcoin addresses using the following process:
  1. Generate an ECC public-private key pair
  2. Perform SHA-256 hash of public key
  3. Perform RIPEMD-160 hash of that result
  4. Add version byte (0x00) to the front of that result
  5. Perform a SHA-256 hash of that result, twice
  6. Append the first 4 bytes of that result to the value from #4
  7. Convert that result into Base58, which eliminates confusing characters 0 and O as well as 1 and l
The last step uses Base58 so that users can write the address on a piece of paper or speak it aloud over the phone. The ECC algorithms are used by Bitcoin to make the public-private key pair "somewhat resistant" to quantum computers. SHA-256 is used twice in step 5 to reduce the chance of success for any as yet unknown attacks against the SHA-2 hash function. Because all Bitcoin users generate addresses themselves, without a central addressing authority, this long process exists to reduce the probability of a collision between addresses to 0.01%. Even so, for improved security, the community suggests generating new addresses for each transaction. Note that only public-private keys and hashing are involved. There are no certificates to revoke or expire.

Transactions contain the following types of entries:
  • Transaction inputs contain a list of addresses and amount of Bitcoins to transfer from those addresses. Also included is a digital signature corresponding to each address. This digital signature is used by the Bitcoin software to verify that the transaction is legitimate and thus these Bitcoins can be spent. There is also a user-generated script used to specify how to access the bitcoins, but the workings of these scripts are outside the scope of this post.
  • Transaction outputs contain a list of addresses and amount of Bitcoins to transfer to those addresses. As with transaction inputs, a user-generated script is included to specify how to spend the bitcoins, but I will not go into further detail here.
  • Another field exists to enter the amount of transaction fees paid to the miners for processing the transaction.
The Bitcoin system broadcasts new transactions to all nodes. Miners select transactions and group them into blocks. A block contains the transactions, a timestamp, a nonce, and a hash of the previous block.

Within each block, Bitcoin stores transactions in a Merkle tree, an example diagram of which is shown below. Transactions reside in the leaves of the tree. Each non-leaf node contains a hash of its children. This data structure is used to prevent corrupt or illicit transactions from being shared, and thus included in the block chain.
A diagram showing an example of a Merkle tree. Each non-leaf node contains a hash of its children. For Bitcoin, transactions reside in the leaves. Source: Wikipedia
A conceptual diagram shows the Bitcoin blockchain. Each block contains: a hash of the previous block, a timestamp, a nonce, and the root of a tree of transactions. Source: Wikipedia
Miners only see Bitcoin addresses and amounts in each transaction, providing some privacy to those submitting transactions. To add a block to the blockchain, miners must solve a proof-of-work function. Once a block has been assembled by a miner, the Bitcoin software generates a nonce, adds it to the content of the block, and then hashes the contents of the block with the nonce using SHA-256, twice, to produce a hash. The system does not share the nonce with the miners. To add their block to the Bitcoin blockchain, the miner must guess nonces, combine them with the block content, and hash this content until they produce the correct hash digest value. This proof-of-work function is designed to be fast for the system to verify, but time-consuming for the miners to execute. The length of the nonce is increased every 14 days to maintain the level of difficulty in solving the proof-of-work function. This value was chosen to ensure that that miners continue to take 10 minutes to process each block. For each block completed, miners are rewarded any user-provided transaction fees included in the transactions as well as newly minted Bitcoins -- a block reward. The block reward is currently set at 12.5 bitcoins, worth $15,939 as of March 2, 2017. Miners run software and dedicated hardware around the globe to solve the proof-of-work function. Currently the local cost of electricity is the limiting factor in profiting from mining bitcoins.
To alter previous transactions, an attacker would need to select the block of the transaction they wished to alter. They would then need to insert their illicit transaction into the block and create a new block. After this they would then need to solve the block containing that transaction and all subsequent blocks faster than more than 50% of all the other miners, thus it is considered to be extremely hard to alter the blockchain.

Bitcoins do not really exist, even on an individual's hard drive. The blockchain contains the record of every bitcoin spent and indicates the current balance at each Bitcoin address. Full Bitcoin nodes have a copy of the blockchain, currently at 105GB, which can create problems for users running full nodes. Satoshi Nakamoto recommended periodically pruning the blockchain of old transactions, but so far this has not been done.

Technology exists to create blockchains outside of Bitcoin, but Bitcoin provides incentives for participation, in terms of monetary rewards. Any system attempting to use a blockchain outside of Bitcoin would need to produce similar incentives for participants. The participation by the miners also secures the blockchain by preventing malicious users from spamming it with transactions.

How accurate are the timestamps in the blockchain? According to the Bitcoin wiki:
A timestamp is accepted as valid if it is greater than the median timestamp of previous 11 blocks, and less than the network-adjusted time + 2 hours. "Network-adjusted time" is the median of the timestamps returned by all nodes connected to you. As a result, block timestamps are not exactly accurate, and they do not even need to be in order. Block times are accurate only to within an hour or two.
Bitcoins come in several denominations. The largest is the Bitcoin. The smallest is the satoshi. One satoshi equals 0.00000001 (1 x 10-8) Bitcoins.

Trusted Timestamping

Trusted Timestamping allows a verifier to determine that the given content existed during the time of the timestamp. It does not indicate time of creation. In many ways, it is like Memento-Datetime because it is still an observation of the document at a given point in time.
Timestamping can be performed by anyone with access to a document. For a timestamp to be defensible, however, it must be produced by a reliable and trusted source. For example, timestamps can be generated for a document by a user's personal computer and then signed with digital signatures. At some point in the future, a verifier can check that the digital signature is correct and verify the timestamp. This timestamp is not trustworthy because the clock on the personal computer may be altered or set incorrectly, thus providing doubt in the accuracy of the timestamp. Some trustworthy party must exist to that not only sets their time correctly, but ensures that timestamps are verifiable in the future.
Trusted Timestamping relies upon a trustworthy authority to accept data, typically a document, from a requestor and issue timestamps for future verification. The process then allows a verifier that has access to the timestamp and the original data to verify that the data existed at that point in time. Thus, two basic overall processes exist: (1) timestamp issue, and (2) timestamp verification.
In addition, privacy is a concern for documents. A document being transmitted can be intercepted and if the document is held by some third party for the purposes of verifying a timestamp in the future, then it is possible that the document can be stolen from the third party.  It is also possible for such a document to become corrupted. To address privacy concerns, trusted timestamping focuses on providing a timestamp for the hash of the content instead. Because such hashes cannot be reversed, the document cannot be reconstructed. Owners of the document, however, can generate the hashes from the document to verify it with the timestamping system.
Finally, verifying the timestamps should not depend on some ephemeral service. If such a service is nonexistent in the future, then the timestamps cannot be verified. Any timestamping solution will need to ensure that verification can be done for the foreseeable future.

Trusted Timestamping with RFC 3161 and ANSI X9.95


ANSI X9.95 extends RFC 3161 to provide standards for trusted timestamping in the form of a third party service called a Time Stamping Authority (TSA). Both standards discuss the formatting of request and response messages used to communicate with a TSA as well as indicating what requirements a TSA should meet.
The TSA issues time-stamp tokens (TST) as supporting evidence that the given content existed prior to a specific datetime. The following process allows the requestor to acquire a given timestamp:
  1. The requestor creates a hash of the content.
  2. The requestor submits this hash to the TSA.
  3. The TSA ensures that its clock is synchronized with an authoritative time source.
  4. The TSA ensures that the hash is the correct length, but, to ensure privacy, does not examine the hash in any other way.
  5. The TSA generates a TST containing the hash of the document, the timestamp, and a digital signature of these two pieces of data. The digital signature is signed with a private key whose sole purpose is timestamping. RFC 3161 requires that the requestor not be identified in the TST. The TST may also include additional metadata, such as the security policy used.
  6. The TST is sent back to the requestor, who should then store it along with the original document for future verification.

Simplified diagram showing the process of using a Time Stamp Authority (TSA) to issue and verify timestamps. Source: Wikipedia
To verify a timestamp, a verifier does not need the TSA. The verifier only needs:
  • the hash of the original document
  • the TST
  • the TSA's certificate
They use the original data and the TST in the following process:
  1. The verifier verifies the digital signature of the TST against the TSA’s certificate. If this is correct, then they know that the TST was issued by the TSA.
  2. The verifier then checks that the hash in the TST matches the hash of the document. If they match, then the verifier knows that the document hash was used to generate that TST.
  3. The timestamp contained in the TST and the hash were used in the generation of the digital signature, hence the TSA observed the document at the given time.
Hauber and Stornetta mentioned that the TSA can be compromised in their 1991 paper "How to Time-Stamp a Digital Document" and prescribed a few solutions, such as linked timestamping, which is implemented by ANSI X9.95. With Linked Timestamping, each TST includes a hash digest of the previous TST. Users can then additionally verify that a timestamp is legitimate by comparing this hash digest with the previously granted TST.

ANSI X9.95 also supports the use of transient-key cryptography. In this case, the system generates a distinct public-private key pair for each timestamp issued. Once a timestamp is issued and digitally signed, the system deletes the private key so that it cannot be compromised. The verifier uses the public key to verify the digital signature.

Services using these standards exist with companies like DigiStampeMudhraTecxoft, and Safe Stamper TSA. Up to 5 free timestamps can be generated per day per IP at Safe Creative's TSA.

The solutions above have different issues.

ANSI X9.95 and RFC 3161 provide additional guidance on certificate management and security to ensure that the TSA is not easily compromised, but the TSA is still the single point of failure in this scheme. If the TSA relies on an incorrect time source or is otherwise compromised, then all timestamps generated are invalid. If the TSA’s certificate expires or is revoked, then verifying past timestamps becomes difficult if not impossible, depending on the availability of the now invalid certificate. If the revoked certificate is still available, the datetime of revocation can be used as an upper bound for the validity of any timestamps. Unfortunately, a certificate is usually revoked because its private key was compromised. A compromised key creates doubt in any timestamps issued using it. If transient-key cryptographic is used, doubt exists with any generated public-private keys as well as their associated timestamps.

Linked timestamping helps ensure that the TSA's tokens are not easily faked, but require that the verifier meet with other verifiers to review the tokens. This requirement violates the need for privacy.

Hauber and Stornetta developed the idea of distributed trust for providing timestamps. The system relies on many clients being ready, available, and synchronized to a time source. Requestors would submit a document hash digest to a random set of k timestamping clients. These clients would in turn each digitally sign their timestamp response.  Because the choice in clients is random, there is a low probability of malicious clients issuing bad timestamps. The requestor would then store all timestamps from the k clients who responded. Unfortunately, this system requires participation without direct incentives.

Trusted Timestamping with OriginStamp


Gipp, Meuschke, and Gernandt recognized that the cryptocurrency Bitcoin provides timestamping as part of maintaining the blockchain. Each block contains a hash of the previous block, implementing something similar to the linking concept developed by Hauber and Stornetta and used in ANSI X9.95. The blockchain is distributed among all full Bitcoin clients and updated by miners, who only see transactions and cannot modify them. In some ways, the distributed nature of Bitcoin resembles parts of Hauber and Stornetta's distributed trust. Finally, the blockchain, because it is distributed to all clients, is an independent authority able to verify timestamps of transactions, much like a TSA, but without the certificate and compromise issues.
They created the OriginStamp system for timestamping user-submitted documents with the Bitcoin blockchain. They chose Bitcoin because it is the most widely used cryptocurrency and thus is perceived to last for a long time. This longevity is a requirement for verification of timestamps in the future.

OriginStamp Process to convert a document content into a Bitcoin address for use in a Bitcoin transaction that can be later verified against the blockchain.
The figure above displays the OriginStamp process for creating a Bitcoin address from a document:
  1. A user submits a document to the system which is then hashed, or just submits the hash of a document.
  2. The submitted document hash is placed into a list of hashes -- seed text -- from other submissions during that day.
  3. Once per day, this seed text is itself hashed using SHA-256 to produce an aggregate hash.
  4. This aggregate hash is used as the Bitcoin private key which is used to generate a public key.
  5. That public key is used to generate a Bitcoin address which can be used in a timestamped Bitcoin transaction of 1 satoshi.
OriginStamp could submit each document hash to the blockchain as an individual transaction, but the hashes are aggregated together to keep operating costs low. Because fees are taken out of every Bitcoin transaction, each transaction costs $ 0.03, allowing Gipp and his team to offer this low cost service for free. They estimate that the system costs $10/year to operate.

Their paper was published in March of 2015. According to coindesk.com, for the March 2015 time period 1 Bitcoin was equal to $268.32. As of March of 2017, 1 Bitcoin is now equal to $960.36. The average transaction fee now sits at approximately 45,200 satoshis, resulting in a transaction fee of $0.43, as of March 26, 2017.

A screenshot of the memento that I timestamp throughout this section.

OriginStamp allows one to submit documents for timestamping using the Bitcoin blockchain. In this case, I submitted the HTML content of the memento shown in the figure above.

OriginStamp responds to the submission by indicating that it will submit a group of hashes to the Bitcoin blockchain in a few hours.

With the OriginStamp service, the requestor acquires a timestamp using the following process:
  1. Submit the document -- or just its hash -- to the OriginStamp website as seen in the screenshots above. If a document is submitted, its hash is calculated and the document is not retained.
  2. OriginStamp sends an email once the system has submitted the concatenated hashes to the Bitcoin blockchain. This email will contain information about the seed text used and this seed  must be used for verification.
  3. In addition, the @originstamp Twitter account will tweet that the given hash was submitted to the blockchain.
A screenshot showing how OriginStamp displays verification information for a given hash. In this case, the document hash is da5328049647343c31e0e62d3886d6a21edb28406ede08a845adeb96d5e8bf50 and it was submitted to the blockchain on 4/10/2017 10:18:29 AM GMT-0600 (MDT) as part of seed text whose hash, and hence private key is c634bcafba86df8313332abc0ae854eea9083b279cdd4d9cde1d516ee6fb70d9.
Because the blockchain is expected to last for the forseeable future and is tamper-proof, it can be used at any time to verify the timestamp. There are two methods available: with the OriginStamp service, or directly against the Bitcoin blockchain using the seed text.

To do so with the OriginStamp service, the verifier can follow this process:

  1. Using the OriginStamp web site, the verifier can submit the hash of the original document and will receive a response as shown in the screenshot above. The response contains the timestamp under the heading "Submitted to Bitcoin network".
  2. If the verifier wishes to find the timestamp in the blockchain, they can expand the "Show transaction details" section of this page, shown below. This section reveals a button allowing one to download the list of hashes (seed text) used in the transaction, the private and public keys used in the transaction, the recipient Bitcoin address, and a link to blockchain.info also allowing verification of the transaction at a specific time.
  3. Using the link "Verify the generated secret on http://blockchain.info", they can see the details of the transaction and verify the timestamp, shown in the figure below.
A screenshot showing that more information is available once the user clicks "show transaction details". The recipient Bitcoin address is outlined in red. From this screen, the user can download the seed text containing the list of hashes submitted to the Bitcoin blockchain. A SHA-256 hash of this seed text is the Bitcoin private key. From this private key, a user can generate the public key and eventually the Bitcoin address for verification. In this case the generated Bitcoin address is 1EcnftDQwawHQWhE67zxEHSLUEoXKZbasy.

A screenshot of the blockchain.info web site showing the details of a Bitcoin transaction, complete with its timestamp. The Bitcoin address and timestamp have been enlarged and outlined in red. For Bitcoin address 1EcnftDQwawHQWhE67zxEHSLUEoXKZbasy, 1 Satoshi was transferred on 2017-02-28 00:01:15, thus that transaction date is the timestamp for the corresponding document.
To verify that a document was timestamped by directly checking the Bitcoin blockchain, one only needs:
  • The hash of the original document.
  • The seed text containing the list of hashes submitted to the bitcoin network.
  • Tools necessary to generate a Bitcoin address from a private key and also search the contents of the blockchain.
If OriginStamp is not available for verification, the verifier would then follow this process:
  1. Generate the hash of the document.
  2. Verify that this hash is in the seed text. This seed text should have been saved as a result of the email or tweet from OriginStamp.
  3. Hash the seed text with SHA256 to produce the Bitcoin private key.
  4. Generate the Bitcoin address using a tool such as bitaddress.org. The figure below shows the use of bitaddress.org to generate a Bitcoin address using a private key.
  5. Search the Bitcoin blockchain for this address, using a service such as blockchain.info. The block timestamp is the timestamp of the document.
A screenshot of bitaddress.org's "Wallet Details" tab with the Bitcoin address enlarged and outlined in red. One can insert a Bitcoin private key and it will generate the pubic key and associated Bitcoin address. This example uses the private key of c634bcafba86df8313332abc0ae854eea9083b279cdd4d9cde1d516ee6fb70d9 shown in previous screenshots which corresponds to a Bitcoin address of 1EcnftDQwawHQWhE67zxEHSLUEoXKZbasy, also shown in previous figures.
OriginStamp also supplies an API that developers can use to submit documents for timestamping as well as verify timestamps and download the seed text for a given document hash.

Comparison of Timestamping Solutions


In the table below I provide a summary comparison between the TSA, Origin Stamp, and submitting directly to the blockchain without OriginStamp.

TSA OriginStamp Directly To Blockchain
Financial Cost per Timestamp Dependent on Service and subscription, ranges from $3 to $0.024 Dependent on size of seed text, but less than Bitcoin transaction fee Bitcoin transaction fee, optimally $0.56
Accuracy of Timestamp Within seconds of time of request, but dependent on number of requests in queue if linked timestamping used Within 1 Day + 2 hours Within 2 hours
Items Needed For Verification Original Document

TST

Certificate of server to verify signature
Original Document

The seed text of hashes submitted at the same time
Original Document
Tools Needed For Verification Certificate verification tools Software to generate Bitcoin Address

Software to search blockchain
Timestamp Storage In TST saved by requestor Blockchain
Privacy Only hash of document is submitted, but TSA knows of the requestor's IP address Miners only see the Bitcoin address, not who submitted the document or even its hash
Targets of Compromise TSA time server

TSA certificate private key
Blockchain
Requirement for Compromise Server is configured insecurely

Server has open software vulnerabilities
>50% of Bitcoin miners colluding
Dependency of Longevity Life of Organization Offering Timestamping Service Continued Interest In Preserving the Blockchain

In the first row, we compare the cost of timestamps from each service. At the TSA service run by Digistamp, an archivist can obtain a cost of $0.024 per timestamp for 600,000 timestamps, but would need to commit to an 1 year fee of $14,400. They would also need to acquire all 600,000 timestamps within a year or lose them. If they pay $10, they are only allocated 30 timestamps and need to use them within 3 months, resulting in a cost of $3 per timestamp. Tecxoft's pricing is similar. OriginStamp attempts to keep costs down by bundling many hashes into a seed text file, but is still at the mercy of Bitcoin's transaction fees. The price of Bitcoin is currently very volatile. The transaction fee mentioned in Gipp's work from 2015 was $0.03. Miners tend to process a block faster if it has a higher transaction fee. The optimal fee has gone up due to Bitcoin's rise in value and an increase in the number of transactions waiting to be processed. The lowest price for the least delay is currently 200 satoshis per byte and the median transaction size is 226 bytes for a total cost of 45,200 satoshis.  This was equivalent to $0.43 when I started writing this blog post and is now $0.56.

In the second row, we compare timestamp accuracy. The TSA should be able to issue a timestamp to the requestor within seconds of the request. This can be delayed by a long queue if the TSA uses linked timestamping because every request must be satisfied in order. OriginStamp, however, tries to keep costs down by submitting its seed list to the blockchain at once-a-day intervals, according to the paper. On top of this, the timestamp in the blockchain is accurate to within two hours of submission of the Bitcoin transaction. This means that an OriginStamp timestamp may be as much as 24 hours + 2 hours = 26 hours off from the time of submission of the document hash. In practice, I do not know the schedule used by OriginStamp, as I submitted a document on February 28, 2017 and it was not submitted to the Bitcoin network until March 4, 2017. Then again, a document submitted on March 19, 2017 was submitted to the Bitcoin network by OriginStamp almost 18 hours later.
If the cost is deemed necessary, this lack of precision can be alleviated by not using OriginStamp but submitting to the blockchain directly. One could generate a Bitcoin address from a single document hash and then submit it to the blockchain immediately. The timestamp precision would still be within 2 hours of transaction submission.
For the TSA, timestamps are stored in the TST, which must be saved by the requestor for future verification. In contrast, OriginStamp saves timestamps in the Blockchain. OriginStamp users still need to save the seed list, so both solutions require the requestor to retain something along with the original document for future verification.

All solutions offer privacy through the use of document hashes. The Bitcoin miners receiving OriginStamp transactions only see the Bitcoin address generated from the hash of the seed list and do not even know it came from OriginStamp, hiding the original document submission in additional layers. The TSA, on the other hand, is aware of the requestor's IP address and potentially other identifying information.
To verify the timestamp, TSA users must have access to the original document, the TST, and the certificate of the TSA to verify the digital signature. OriginStamp only requires the original document and the seed list of hashes submitted to the blockchain. This means that OriginStamp requires slightly fewer items to be retained.
If using the blockchain directly, without OriginStamp, a single document hash could be used as the private key. There would be no seed list in this case. For verification, one would merely need the original document, which would be retained anyway.
To compromise the timestamps, the TSA time server must be attacked. This can be done by taking advantage of software vulnerabilities or insecure configurations. TSAs are usually audited to prevent insecure configurations, but vulnerabilities are frequently discovered in software. OriginStamp, on the other hand, requires that the blockchain be attacked directly, which is only possible if more than 50% of Bitcoin miners collude to manipulate blocks.
Finally, each service has different vulnerabilities when it comes to longevity. Mementos belong to web archives, and as such, are intended to exist far, far longer than 20 years. This makes longevity a key concern in any decision of a timestamping solution. The average lifespan of a company is now less than 20 years and expected to decrease. The certificates for verifying a timestamp running a TSA may last for little more than 3 years. This means that the verifier will need someone to have saved the TSA's certificate prior to verification of the timestamp. If the organization with the document and the TST is also the same organization providing the TSA's certificate, then there is cause for doubt in its validity because that organization can potentially forge any or all of these verification components.
The Bitcoin blockchain, on the other hand, is not tied to any single organization and is expected to last as long as there is interest in investing in the cryptocurrency. In addition, there are many copies of the blockchain available in the world. If Bitcoin goes away, there is still an interest in maintaining the blockchain for verification of transactions, and thus retaining copies of the blockchain by many parties. If someone wishes to forge a timestamp, they would need to construct their own illicit blockchain. Even if they went that far, a copy of their blockchain can be compared to other existing copies to evaluate its validity. Thus, even if the blockchain is no longer updated, it is still an independent source of information that can be used for future verification. If the blockchain is ever pruned, then prior copies will still need to be archived somewhere for verification of prior transactions. The combined interests of all of these parties support the concept of the Bitcoin blockchain lasting longer than a single server certificate or company.

So, with trusted timestamping available, what options do we have to make it easy to verify memento timestamps?

Trusted Timestamping of Mementos


It is worth noting that, due to delays between timestamp request and response in each of these solutions, trusted timestamping is not a replacement for Memento-Datetime. The Memento-Datetime header provides a timestamp of when the web archive captured a given resource. Trusted timestamping, on the other hand, can provide an additional dimension of certainty that a resource existed at a given datetime. Just as Memento-Datetime applies to a resource at a specific URI-M, so would a trusted timestamp.

A crawler can capture a memento as part of its normal operations, compute the hash of its content, and then submit this hash for timestamping to one of these services. The original memento content encountered during the crawl, the raw memento, must be preserved by the archive indefinitely. The memento can include a link relation in the response headers, such as the unregistered trusted-timestamp-info relation shown in the example headers below, indicating where one can find additional information to verify the timestamp.

HTTP/1.1 200 OK Server: Tengine/2.1.0 Date: Thu, 21 Jul 2016 17:34:15 GMT Content-Type: text/html;charset=utf-8 Content-Length: 109672 Connection: keep-alive Memento-Datetime: Thu, 21 Jul 2016 15:25:44 GMT Link: <http://www.cnn.com/>; rel="original", <http://a.example.org/web/timemap/link/http://www.cnn.com/>; rel="timemap"; type="application/link-format", <http://a.example.org/web/http://www.cnn.com/>; rel="timegate", <http://a.example.org/web/20160721152544/http://www.cnn.com/>; rel="last memento"; datetime="Thu, 21 Jul 2016 15:25:44 GMT", <http://a.example.org/web/20160120080735/http://www.cnn.com/>; rel="first memento"; datetime="Wed, 20 Jan 2016 08:07:35 GMT", <http://a.example.org/web/20160721143259/http://www.cnn.com/>; rel="prev memento"; datetime="Thu, 21 Jul 2016 14:32:59 GMT", <http://a.example.org/timestamping/20160722191106/http://www.cnn.com/>; rel="trusted-timestamp-info"

The URI linked by the timestamp-info relation could be a JSON-formatted resource providing information for verification. For example, if OriginStamp is used for timestamping, then the resource might look like this:

{ "timestamping-method": "OriginStamp", "seed-text": "http://a.example.org/timestamping/20160722191106/seedtext.txt",
"hash-algorithm": "SHA-256" }

In this case, a verifier already knows the URI-M of the memento. They only need to locate the raw memento, calculate its hash, and use the seed text as described above to generate the Bitcoin address and find the timestamp in the Bitcoin blockchain.

Or, if an RFC 3161 solution is used, the timestamp-info resource could look like this:

{ "timestamping-method": "RFC 3161", "timestamp-token": "http://a.example.org/timestamping/tst/20160721174431/http://www.cnn.com", "tsa-certificate": "http://a.example.org/timestamping/tsacert/cert.pem",
"hash-algorithm": "SHA-256" }

In this case, a verifier can locate the raw memento, calculate its hash, and use verify it using the timestamp token (TST) and the TSA certificate as described above for RFC 3161.

If it is known that the crawler creates the hash of the raw memento and uses it as a private key for generating a Bitcoin address, thus submitting it directly to the blockchain for timestamping, then no additional headers would be needed. Verifiers only need the content of the raw memento to generate the hash. In addition, perhaps a separate timestamping service could exist for mementos, using the URI-M (e.g., https://timestamping-service.example.org/{URI-M}).

If one specific timestamping scheme is used, then perhaps specific link relations can be created to convey the resources from each of these fields.

Of course, this assumes that we only want to timestamp raw mementos. Conceivably, one might wish to timestamp a screenshot of a web page, or its WARC. We will need to perform additional analysis of the other potential use cases needed.

Summary


In this post, I have discussed different timestamping options. These options have different levels of complexity and security. All of them support privacy by a permitting the submission of a document hash to a timestamping service. OriginStamp attempts to address some of the concerns of the existing ANSI X9.95/RFC 3161 standard by using the timestamping features of the Bitcoin blockchain.

Of these options, the Bitcoin blockchain offers a decentralized, secure solution that supports privacy and does not depend on a central server that can fail or be compromised. Because copies of the blockchain are distributed to all full Bitcoin clients, it remains present for verification in the future. Bitcoin has been around for 8 years and continues to increase in value. Because all participants have incentives to keep the blockchain distributed and up to date, it is expected to outlast most companies, who have a median age of 20 years. In addition, if Bitcoin is no longer used, copies of the blockchain will still need to be maintained indefinitely for verification. It does, however, suffer from issues with timestamp accuracy inherent in the Bitcoin protocol. These can be alleviated by submitting a document hash directly against the blockchain.

Companies offering trusted timestamping using TSAs, on the other hand, may not have the longevity and require subscription fees for a limited number of timestamps. Though Bitcoin is currently volatile, it has stabilized before, and the subscription fees from these companies are still more expensive on average than the Bitcoin transaction fee.

Even though timestamping options exist, use cases must be identified for the verification of such timestamps in the future. These use cases will inform requestors of which content to be timestamped and will also affect which timestamping solution is selected. It would also be beneficial for verifiers to have links to additional resources for verification.

Trusted timestamping of mementos is possible, but will require some additional decisions and technology to become a reality.

Additional References Used For This Post:

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Thanks for sharing! What an excellent overview of trusted timestamping and the blockchain-based approach to timestamping used by OriginStamp. It's certainly worthwhile to trusted timestamp Mementos - even if only as another layer of certainty. If you have any feature requests for the OriginStamp API [0], let us know, we’d be happy to implement them!

    By the way, our group has also been researching various use cases of trusted timestamping. Some of our proposed applications include:
    indefinitely verifying the time of manuscript submission and peer-reviewer feedback in academia [1]
    proving the time at which creative content was first shared on a social media platform [2]
    securing the integrity of video footage recorded by dash cameras [3]
    and also securing digital cultural heritage in general, e.g. verifying the content of online news articles and their changes over time [4]

    Papers/Links describing each of these projects can be found here:
    [0] https://app.originstamp.org/dev
    [1] https://gipp.com/wp-content/papercite-data/pdf/gipp2017b.pdf
    [2] https://gipp.com/wp-content/papercite-data/pdf/breitinger2017.pdf
    [3] https://gipp.com/wp-content/papercite-data/pdf/gipp2016a.pdf
    [4] http://www.ieee-tcdl.org/Bulletin/current/papers/gipp.pdf

    Also, due to the problem of the increasing transaction fees in the Bitcoin network, we are researching alternative blockchains to be used for trusted timestamping.

    ReplyDelete