Monday, April 23, 2018

2018-04-23: "Grampa, what's a deleted tweet?"


In early February, 2018 Breitbart News made a splash with its inflammatory tweet suggesting Muslims will end Super Bowl,  which they deleted twelve hours later stating it did not meet their editorial standards. The deleted tweet had an imaginary conversation between a Muslim child and a grandparent about the Super Bowl and linked to one of articles on the declining TV ratings of  National Football League (NFL) for the annual championship game. News articles from The Hill, Huffington Post, Politico, Independent, etc., talked about the deleted tweet controversy in detail. 

Being web archiving researchers, we decided to look into the deleted tweet incident of Breitbart News to shed some light on their deleted tweets pattern over recent months.

Role of web archives in finding deleted tweets   


Hany M. SalahEdeen and Michael L. Nelson in their paper, "Losing my revolution: How many resources shared on social media have been lost?",  talk about the amount of resources shared in social media that is still live or present in the public web archives. They concluded that nearly 11% of the shared resources are lost in their first year and after that we lose the shared resources at a rate of 0.02% per day.

Web archives such as Internet ArchiveArchive-ItUK Web Archives, etc., have an important role in the preservation of resources shared in social media. Using web archives, sometimes we can recover deleted tweets. For example, Miranda Smith in her blog post, "Twitter Follower Count History via Internet Archive" talks about using Internet Archive to fetch historical Twitter data to graph followers count over time. She also explains the advantages of using web archives for finding historical data of users over the Twitter API.

The only caveat in using web archives to uncover the deleted tweets is its limited coverage of Twitter. But for popular Twitter accounts having a high number of mementos such as RealDonaldTrumpBarrack ObamaBreitbartNewsCNN, etc., we can often uncover deleted tweets. The issue of "How Much of the Web Is Archived?" has been discussed by Ainsworth et al. but there has been no separate analysis on how much of Twitter is archived which will help us in estimating the accuracy of finding deleted tweets using web archives.

Web services like Politwoops track deleted tweets of public officials including people currently in office and candidates for office in the USA and some EU nations. However, tweets deleted before a  person becomes a candidate or tweets deleted after a person left office will not be covered. Although Politwoops tracks the elected officials, it misses out on appointed government officials like Michael FlynnFor these twitter accounts web archives are the lone solution to finding their deleted tweets. The most important aspect of not relying totally on these web services alone to find the deleted tweets is due to them being banned by Twitter. It happened once in June, 2015 with Twitter citing violation of the developer agreement. It took Politwoops six months to resume its services back in December, 2015. These instances of being banned by Twitter suggest that we explore web archives to uncover deleted tweets in case of services like Politwoops are banned again.  

Why are deleted tweets important?


With the surge in the usage of social media sites like Twitter, Facebook etc., researchers have been using social media sites to study patterns of online user behaviour.  In context of Twitter, deleted tweets play an important role in understanding users' behavioural patterns. In the paper, "An Examination of Regret in Bullying Tweets", Xu et al. built a SVM-based classifier to predict deleted tweets from Twitter users posting bullying related tweets to later regret and delete them. Petrovic et al., in their paper, "I Wish I Didn’t Say That! Analyzing and Predicting Deleted Messages in Twitter", discuss about the reasons for deleted tweets and using a machine learning approach to predict them. They concluded by saying that tweets with swear words have higher probability of being deleted. Zhou et al. in their papers, "Tweet Properly: Analyzing Deleted Tweets to Understand and Identify Regrettable Ones" and "Identifying Regrettable Messages from Tweets", mention the impact of published tweets that cannot be undone by deletion, as other users have noticed the tweet and cached them even before they are deleted.      


How were deleted tweets found?


To begin our analysis, we used the Twitter API to fetch the most recent 3200 tweets from Breitbart News' Twitter timeline. The live tweets fetched from the Twitter API spanned from 2017-10-22 to 2018-02-18. Later, we received the TimeMap for Breitbart's Twitter page using Memgator, the Memento aggregator service built by Sawood Alam. Using the URI-Ms from the fetched TimeMap, we collected mementos for Breitbart's Twitter page within the specified  time range of live tweets fetched using the Twitter API. 

Code to fetch recent tweets using Python-Twitter API

import twitter
api = twitter.Api(consumer_key='xxxxxx',
                      consumer_secret='xxxxxx',
                      access_token_key='xxxxxx',
                      access_token_secret='xxxxxx',
                      sleep_on_rate_limit=True)
                      
twitter_response = api.GetUserTimeline(screen_name=screen_name, count=200, include_rts=True)

Shell command to run Memgator locally 

$ memgator --contimeout=10s --agent=XXXXXX server 
MemGator 1.0-rc7
   _____                  _______       __
  /     \  _____  _____  / _____/______/  |___________
 /  Y Y  \/  __ \/     \/  \  ___\__  \   _/ _ \_   _ \
/   | |   \  ___/  Y Y  \   \_\  \/ __ |  | |_| |  | \/
\__/___\__/\____\__|_|__/\_______/_____|__|\___/|__|

TimeMap   : http://localhost:1208/timemap/{FORMAT}/{URI-R}
TimeGate  : http://localhost:1208/timegate/{URI-R} [Accept-Datetime]
Memento   : http://localhost:1208/memento[/{FORMAT}|proxy]/{DATETIME}/{URI-R}

# FORMAT          => link|json|cdxj
# DATETIME        => YYYY[MM[DD[hh[mm[ss]]]]]
# Accept-Datetime => Header in RFC1123 format

Code to fetch TimeMap for any twitter handle

1
2
3
4
url = "http://localhost:1208/timemap/"
data_format = "cdxj"
command = url + data_format + "/http://twitter.com/<screen-name>" + 
response = requests.get(command)
We parsed tweets and their tweet ids from each memento and compared each archived tweet id with the live tweet ids fetched using the Twitter API. We further validated the status of tweet ids present in web archives but deleted on Twitter using the Twitter API to confirm if the tweets were deleted. On comparing the live and archived versions of tweets, we discovered 22 deleted tweets from Breitbart News.

Code to parse tweets, their timestamps and tweet ids from mementos


import bs4

soup = bs4.BeautifulSoup(open(<HTML representation of Memento>),"html.parser")
match_tweet_div_tag = soup.select('div.js-stream-tweet')
for tag in match_tweet_div_tag:
   if tag.has_attr("data-tweet-id"):
       # Get Tweet id
       ...........
       # Parse tweets
       match_timeline_tweets = tag.select('p.js-tweet-text.tweet-text')
       ...........
       # Parse tweet timestamps
       match_tweet_timestamp = tag.find("span", {"class": "js-short-timestamp"})
       ...........

Analysis of Deleted Tweets from Breitbart News


The most prominent of the 22 deleted tweets was the above mentioned Super Bowl deleted tweet. Talking about the above mentioned deleted tweet in context for people who are unaware of the role of web archives, we urge them that taking screenshots fearing something might be lost in future is smart but it would be even better if we push them to the web archives where it would be preserved for a longer time than compared to someone's private archive. For further information refer to Plinio Varagas's blog post "Links to Web Archives, not Search Engine Caches", where he talks about the difference between archived pages and search engine caches in terms of the decay period of the web pages.

Fig 1 - Super Bowl tweet on Internet Archive
Tweet Memento at Internet Archive
There is another tweet which was initially tweeted by Allum Bokhari, a senior Breitbart correspondent, and retweeted by Breitbart News but was later un-retweeted. The original tweet from Allum Bokhari is present on the live of web but the retweet is missing from the live web, with the plausible reason of Breitbart News later retweeting a similar post from Allum Bokhari.
Undo retweet of Breitbart News
Fig 2 - Archived version of unretweeted tweet by Breitbart News
Tweet memento at the Internet Archive

Fig 3 - Live version of unretweeted tweet by Breitbart News
Live Tweet Status
Of the 22 deleted tweets, 20 were of the form where Breitbart News retweeted someone's tweet but the original tweet was lost. Of those 20 tweets, 18 were from two affiliates of Breitbart News, NolteNC and John Carney. Therefore, we decided to have a look at both the accounts to determine the reason for their deleted tweets.

Analysis of deleted tweets from John Carney and  NolteNC


We fetched live tweets for John Carney using the Twitter API and then fetched the TimeMap for John Carney's Twitter page using Memgator and mementos within the time range of live tweets fetched using the Twitter API. Due to the low number of mementos within the specified time range, the analysis showed no deleted tweets. We then fetched live tweets from the Twitter API for John Carney for a week to find deleted tweets by comparing with all the previous responses from the Twitter API. We discovered that tweets older than seven days are automatically deleted on Tuesday and Saturday. The precise manner in deletion of tweets suggests the use of any automated tweet deletion service. There are a number of tweet deletion services like Twitter DeleterTweet Eraser etc. which delete tweets on certain conditions based on the lifespan of the tweet or the number of tweets to be present in the Twitter timeline at any given instance.
Fig 4 - John Carney's tweet deletion pattern shown with 50 tweet ids
We fetched live tweets for NolteNC using the Twitter API and then fetched the TimeMap for NolteNC's Twitter page using Memgator and mementos within the time range of live tweets fetched using the Twitter API. As for NolteNC, we had a considerable number of mementos within the specified time range to discover his deleted tweets. Our analysis provided us with 169 live tweets and 3569 deleted tweets from 2017-11-03 to 2018-02-17.
Fig 5 - NolteNC's original tweet


Fig 6 - Breitbart News retweeting NolteNC's tweet.
With 1000s of deleted tweets, it seemed unlikely that he was manually deleting tweets. We had all the reasons to believe that similar to John Carney, NolteNC deleted tweets automatically using some tweet deletion service. We collected live tweets for his account over a week and compared all the previous responses from the Twitter API to come to the conclusion that all his tweets which were aged more than seven days on Wednesday and Saturday were deleted.
Fig 7 - NolteNC's tweet deletion pattern shown with 50 tweets 

Conclusions

  1. It is not enough to make screen shots of controversial tweets  but, we need to push web contents that we wish to preserve for future fearing of its loss to the web archives due to longer retention capability than our personal archives.
  2. For finding deleted tweets, web archives work effectively for popular accounts because they are archived often but for less popular accounts with fewer mementos this approach will not work.
  3. Although Breitbart News does not delete tweets often, some of its correspondents automatically delete their tweets, effectively deleting the corresponding retweets.
--
Mohammed Nauman Siddique (@m_nsiddique)

Friday, April 13, 2018

2018-04-13: Web Archives are Used for Link Stability, Censorship Avoidance, and Traffic Siphoning

ISIS members immolating captured Jordanian pilot
Web archives have been used for purposes other than digital preservation and browsing historical data. These purposes can be divided into three categories:

  1. Uploading content to web archives to ensure continuous availability of the data.
  2. Avoiding governments' censorship or websites' terms of service.
  3. Using URLs from web archives, instead of direct links, for news sites with opposing ideologies to avoid increasing their web traffic and deprive them of ad revenue.

1. Uploading content to web archives to ensure continuous availability of the data


Web archives, by design, are intended to solve the problem of digital data preservation so people can access data when it is no longer available on the live web. In this paper, Who and What Links to the Internet Archive, (Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, and Michael L. Nelson, 2013), the authors show that the percentage of the requested archived pages which currently do not exist on the live web is 65%. The paper also determines where do Internet Archive's Wayback Machine users come from. The following table, from the paper, contains the top 10 referrers that link to IA’s Wayback Machine. The list of top 10 referrers represents 51.9% of all the referrers. en.wikipedia.org outnumbers all other sites including search engines and the home page of Internet Archive (archive.org).
The top 10 referrers that link to IA’s Wayback Machine
Who and What Links to the Internet Archive, (AlNoamany et al. 2013) Table 5

Sometimes the archived data is controversial and the user wants to make sure that he or she can refer back to it later in case it is removed from the live web. A clear example of that is the deleted tweets from U.S. president Donald Trump.
Mr. Trump's deleted tweets on politwoops.eu


2. Avoiding governments' censorship or websites' terms of service


Using the Internet Archive to find a way around terms of service for file sharing sites was addressed by Justin Littman in a blog post, Islamic State Extremists Are Using the Internet Archive to Deliver Propaganda. He stated that ISIS sympathizers are using the Internet Archive as a web delivery platform for extremist propaganda, posing a threat to the archival mission of Internet Archive. Mr. Littman did not evaluate the content to determine if it is extremist in nature since much of it is in Arabic. This behavior is not new. It has been noted with some of the data uploaded by Al-Qaeda sympathizers a long time before ISIS was created. Al-Qaeda uploaded this file https://archive.org/details/osamatoobama to the Internet Archive on February 16 of 2010 to circumvent file sharing sites' content removal policies. ISIS sympathizers upload clips documenting battles, executions, or even video announcements by ISIS leaders to the Internet Archive because that type of data will get automatically removed from the web if uploaded to video sharing sites like Youtube to prevent extremists propaganda.

On February 4th of 2015, ISIS uploaded a video to the Internet Archive featuring the execution by immolation of captured Jordanian pilot Muath Al-Kasasbeh; that's only one day after the execution! This video violates Youtube's terms of service and is no longer on Youtube.
https://archive.org/details/YouTube_201502
ISIS members immolating captured Jordanian pilot (graphic video)
In fact, Youtube's algorithm is so aggressive that it removed thousands of videos documenting the Syrian revolution. Activists argued that the removed videos were uploaded for the purpose of documenting atrocities during the Syrian government's crackdown, and that Youtube killed any possible hope for future war crimes prosecutions.

Hani Al-Sibai, a lawyer, Islamic scholar, Al-Qaeda sympathizer, and a former member of The Egyptian Islamic Jihad Group who lives in London as a political refugee, uploads his content to the Internet Archive. Although he is anti-ISIS and, more often than not, his content does not encourage violence and he only had few issues with Youtube, he pushes his content to multiple sites on the web including web archiving sites to ensure continuous availability of his data.

For example, this is a an audio recording from Hani Al-Sibai condemning the immolation of the Jordanian pilot, Muath Al-Kasasbeh. Mr. Al-Sibai uploaded this recording to the Internet Archive a day after the execution.
https://archive.org/details/7arqTayyar
An audio recording by Hani Al-Sibai condemning the execution by burning (uploaded to IA a day after the execution)

These are some examples where the Internet Archive is used as a file sharing service. Clips are simultaneously uploaded to Youtube. Vimeo, and the Internet Archive for the purpose of sharing.
Screen-shot from justpaste.it where links to videos uploaded to IA are used for sharing purpose 
Both videos shown in the screen shot were removed from Youtube for violating terms of service, but they are not lost because they have been uploaded to the Internet Archive.

https://www.youtube.com/watch?v=Cznm0L5X9LE
Rebuttal from Hani Al-Sibai addressing ISIS spokesman's attack on Al-Qaeda leader Ayman Al-Zawaheri (removed from Youtube)

https://archive.org/details/Fajr3_201407
Rebuttal from Hani Al-Sibai addressing ISIS spokesman's attack on Al-Qaeda leader Ayman Al-Zawaheri (uploaded to IA)

https://www.youtube.com/watch?v=VuSgxhBtoic
Rebuttal from Hani Al-Sibai addressing ISIS leader's speech on the expansion of ISIS (removed from Youtube)

https://archive.org/details/Ta3liq_Hadi
Rebuttal from Hani Al-Sibai addressing ISIS leader's speech on the expansion of ISIS (uploaded to IA)
The same video was not removed from Vimeo
https://vimeo.com/111975796
Rebuttal from Hani Al-Sibai addressing ISIS leader's speech on the expansion of ISIS (uploaded to Vimeo)
I am not sure if web archiving sites have content moderation policies, but even with sharing sites that do, they are inconsistent! Youtube is a perfect example; no one knows what YouTube's rules even are anymore.

Less popular use of the Internet Archive include browsing the live web using Internet Archive links to bypass governments' censorship. Sometimes, governments block sites with opposing ideologies, but their archived versions remain accessible. When these governments realize that their censorship is being evaded, they entirely block the Internet Archive to prevent access to the the same content they blocked on the live web. In 2017, the IA’s Wayback Machine was blocked in India and in 2015, Russia blocked the Internet Archive over a single page!

3. Using URLs from web archives instead of direct links for news sites with opposing ideologies to deprive them of ad revenue

Even when the live web version is not blocked, there are situations where readers want to deny traffic and the resulting ad revenue for web sites with opposing ideologies. In a recent paper, Understanding Web Archiving Services and Their (Mis)Use on Social Media (Savvas Zannettou, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, 2018), the authors presented a large-scale analysis of Web archiving services and their use on social network, the archived content, and how it is shared/used. They found that contentious news and social media posts are the most common types of content archived. Also, URLs from web archiving sites are widely posted on “fringe” groups in Reddit and 4chan to preserve controversial data that might disappear; this case also falls under the first category. Furthermore, the authors found evidence of groups' admins forcing members to use URLs from web archives instead of direct links to sites with opposing ideologies to refer to them without increasing their traffic or to deprive them of ad revenue. For instance, The_Donald subreddit systematically targets ad revenue of news sources with adverse ideologies using moderation bots that block URLs from those sites and prompt users to post archive URLs instead.

The authors also found that web archives are used to evade censorship policies in some communities: for example, /pol/ users post archive.is URLs to share content from 8chan and Facebook, which are banned on the platform, or to dodge word-filters (e.g., ‘smh’ becomes ‘baka’, so links to smh.com.au point to baka.com.au instead).

According to the authors, Reddit bots are responsible for posting a huge portion of archive URLs in Reddit due to moderators trying to ensure the availability of the data, but this practice affects the amount of traffic that the source sites would have received from Reddit.

I went on 4chan to include a few examples similar to those examined in the paper and despite not knowing what 4chan is prior to reading the paper, I was able to find a couple of examples of sharing archived links on 4chan in just under 2 minutes. I took screen shots of both examples; the threads have been deleted since 4chan removes threads after they reach page 10.

Pages are archived on archive.is then shared on 4chan
Sharing links to archive.org in a comment on 4chan

The take away message is that web archives have been used for purposes other than digital preservation and browsing historical data. These purposes include:
  1. Uploading content to web archives to mitigate the risk of data loss.
  2. Avoiding governments' censorship or websites' terms of service.
  3. Using URLs from web archives, instead of original source links for news sites with opposing ideologies to deprive them of ad revenue.
--
Hussam Hallak

Monday, April 9, 2018

2018-04-09: Trip Report for the National Forum on Ethics and Archiving the Web (EAW)


On March 23-24, 2018 I attended the National Forum on Ethics and Archiving the Web (EAW), hosted at the New Museum and organized by Rhizome and the members of the Documenting the Now project.  The nor'easter "Toby" frustrated the travel plans of many, including causing my friend Martin Klein to have to cancel completely and for me to not arrive at the New Museum until after the start of the second session at 2pm on Thursday.  Fortunately, all the sessions were recorded and I link to them below.

Day 1 -- March 22, 2018


Session 1 (recording) began with a welcome, and then a keynote by Marisa Parham, entitled "The Internet of Affects: Haunting Down Data".  I did have the privilege of seeing her keynote at the last DocNow meeting in December, and looking at the tweets ("#eaw18") she addressed some of the same themes, including the issues of the process of archiving social media (e.g., tweets) and the resulting decontextualization, including "Twitter as dataset vs. Twitter as experience", and "how do we reproduce the feeling of community and enhance our understanding of how to read sources and how people in the past and present are engaged with each other?"  She also made reference to the Twitter heat map for showing interaction with the Ferguson grand jury verdict ("How a nation exploded over grand jury verdict: Twitter heat map shows how 3.5 million #Ferguson tweets were sent as news broke that Darren Wilson would not face trial").



After Marisa's keynote was the panel on "Archiving Trauma", with Michael Connor (moderator), Chido Muchemwa, Nick Ruest (slides), Coral Salomón, Tonia Sutherland, and Lauren Work.  There are too many important topics here and I did not experience the presentations directly, so I will refer you to the recording for further information and a handful of selected tweets below. 


The next session after lunch was "Documenting Hate" (recording), with Aria Dean (moderator), Patrick Davison, Joan Donovan, Renee Saucier, and Caroline Sinders.  I arrived at the New Museum about 10 minutes into this panel.  Caroline spoke about the Pepe the Frog meme, its appropriation by Neo-Nazis, and the attempt by its creator to wrest it back -- "How do you balance the creator’s intentions with how culture has remixed their work?"

Joan spoke about a range of topics, including archiving the Daily Stormer forum, archiving the disinformation regarding the attacks in Charlottesville this summer (including false information originating on 4chan about who drove the car), and an algorithmic image collection technique for visualizing trending images in the collection.


Renee Saucier talked about experiences collecting sites for the "Canadian Political Parties and Political Interest Groups" (Archive-It collection 227), which includes Neo-Nazi and affiliated political parties.


The next panel was "Web Archiving as Civic Duty", with Amelia Acker (co-moderator), Natalie Baur, Adam Kriesberg (co-moderator) (transcript), Muira McCammon, and Hanna E. Morris.  My own notes on this session are sparse (in part because most of the presenters did not use slides), so I'll include a handful of tweets I noted that I feel succinctly capture the essence of the presentations.  I did find a link to Muria's MS thesis "Reimagining and Rewriting the Guantánamo Bay Detainee Library: Translation, Ideology, and Power", but it is currently under embargo.  I did find an interview with her that is available and relevant.  Relevant to Muria's work with deleted US Govt accounts is Justin Littman's recent description of a disinformation attack with re-registering deleted accounts ("Vulnerabilities in the U.S. Digital Registry, Twitter, and the Internet Archive"). 2018-04-17 update: Muira just published two related articles about deleted tweets: "Trouble @JTFGTMO" and "Can They Really Delete That?".


The third session, "Curation and Power" (recording) began with a panel with Jess Ogden (moderator), Morehshin Allahyari, Anisa Hawes, Margaret Hedstrom, and Lozana Rossenova.  Again, I'll borrow heavily from tweets. 


The final session for Thursday was the keynote by Safiya Noble, based on her recent book "Algorithms of Oppression" (recording).  I really enjoyed Safiya's keynote; I had heard of some of the buzz and controversy (see my thread (1, 2, 3) about archiving some of the controversy) around the book but I had not yet given it a careful review (if you're not familiar with it, read this five minute summary Safiya wrote for Time).  I include several insightful tweets from others below, but I'll also summarize some of the points that I took away from her presentation (and they should be read as such and not as a faithful or complete transcription of her talk).

First, as a computer scientist I understand and am sympathetic to the idea that ranking algorithms that Google et al. use should be neutral.  It's an ultimately naive and untenable position, but I'd be lying if I said I did not understand the appeal.  The algorithms that help us differentiate quality pages from spam pages about everyday topics like artists, restaurants, and cat pictures do what they do well.  In one of the examples I use in my lecture (slides 55-58), it's the reason why for the query "DJ Shadow", the wikipedia.org and last.fm links appear on Google's page 1, and djshadow.rpod.ru appears on page 15: in this case the ranking of the sites based on their popularity in terms of links, searches, clicks, and other user-oriented metrics makes sense.  But what happens when the query is, as Safiya provides in her first example, "black girls"?  The result (ca. 2011) is almost entirely porn (cf. the in-conference result for "asian girls"), and the algorithms that served us so well in finding quality DJ Shadow pages in this case produce a socially undesirable result.  Sure, this undesirable result is from having indexed the global corpus (and our interactions with it) and is thus a mirror of the society that created those pages, but given the centrality in our life that Google enjoys and the fact that people consider it an oracle rather than just a tool that gives undesirable results when indexing undesirable content, it is irresponsible for Google to ignore the feedback loop that they provide; they no longer just reflect the bias, they hegemonically reinforce the bias, as well as give attack vectors for those who would defend the bias

Furthermore, there is already precedent for adjusting search results to eliminate bias in other dimensions: for example, PageRank by itself is biased against late-arriving pages/sites (e.g., "Impact of Web Search Engines on Page Popularity"), so search engines (SEs) adjust the rankings to accommodate these pages.  Similarly, Google has a history of intervening to remove "Google Bombs" (e.g., "miserable failure"), punish attempts to modify ranking, and even replacing results pages with jokes -- if these modifications are possible, then Google can no longer pretend the algorithm results are inviolable. 

She did not confine her criticism to Google, she also examined query results in digital libraries like ArtStor.  The metadata describing the contents in the DL originate from a point-of-view, and queries with a different POV will not return the expected results.  I use similar examples in my DL lecture on metadata (my favorite is reminding the students that the Vietnamese refer to the Vietnam War as the "American War"), stressing that even actions as seemingly basic as assigning DNS country codes (e.g., ".ps") are fraught with geopolitics, and that neutrality is an illusion even in a discipline like computer science. 

There's a lot more to her talk than I have presented, and I encourage you to take the time to view it.  We can no longer pretend Google is just the "backrub" crawler and google.stanford.edu interface; it is a lens that both shows and shapes who we are.  That's an awesome responsibility and has to be treated as such.


Day 2 -- March 23, 2018


The second day began with the panel "Web as Witness - Archiving & Human Rights" (recording), with Pamela Graham (moderator), Anna Banchik, Jeff Deutch, Natalia Krapiva, and Dalila Mujagic. Anna and Natalia presented the activities of the UC Berkeley Human Rights Investigations Lab, where they do open-source investigations (discovering, verifying, geo-locating, more) publicly available data of human rights violations.  Next was Jeff talking about the Syrian Archive, and the challenges they faced with Youtube algorithmically removing what they believed to be "extremist content".  He also had a nice demo about how they used image analysis to identify munitions videos uploaded by Syrians.  Dalila presented the work of WITNESS, an organization promoting the use of video to document human rights violations and how they can be used as evidence.  The final presentation was about the airwars.org (a documentation project about civilian causalities in air strikes), but I missed a good part of this presentation as I focused on my upcoming panel. 


My session, "Fidelity, Integrity, & Compromise", was Ada Lerner (moderator) (site), Ashley Blewer (slides, transcript) Michael L. Nelson (me) (slides), and Shawn Walker (slides).  I had the luxury of going last, but that meant that I was so focused on reviewing my own material that I could not closely follow their presentations.  I and my students have read Ada's paper and it is definitely worth reviewing.  They review a series of attacks (and fixes) that all center around "abandoned" live web resources (what we called "zombies") that can be (re-)registered and then included in historical pages.  That sounds like a far-fetched attack vector, except when you remember that modern pages include 100s of resources from many different sites via Javascript, and there is a good chance that any page is likely to include a zombie whose live web domain is available for purchase.  Scott's presentation dealt with research issues surrounding using social media, and Ashley's talk dealt with role of using fixity information (e.g., "There's a lot "oh I should be doing that" or "I do that" but without being integrated holistically into preservation systems in a way that brings value or a clear understand as to the "why"").  As for my talk, I asserted that Brian Williams first performed "Gin and Juice" in 1992, a full year before Snoop Dogg, and I have a video of a page in the Internet Archive to "prove" it.  The actual URI in which it is indexed in the Internet Archive is obfuscated, but this video is 1) of an actual page in the IA, that 2) pulls live web content into the archive, despite the fixes that Ada provided, and 3) the page rewrites the URL in the address bar to pretend to be at a different URL and time (in this case, dj-jay-requests.surge.sh, and 19920531014618 (May 31, 1992)). 






The last panel before lunch was "Archives for Change", with Hannah Mandel (moderator), Lara Baladi, Natalie Cadranel, Lae’l Hughes-Watkins, and Mehdi Yahyanejad.  My notes for this session are sparse, so again I'll just highlight a handful of useful tweets.




After lunch, the next session (recording) was a conversation between Jarrett Drake and Stacie Williams on their experiences developing the People's Archive of Police Violence in Cleveland, which "collects, preserves, and shares the stories, memories, and accounts of police violence as experienced or observed by Cleveland citizens."  This was the only panel with the format of two people having a conversation (effectively interviewing each other) about their personal transformation and lessons learned.


The next session was "Stewardship & Usage", with Jefferson Bailey, Monique Lassere, Justin Littman, Allan Martell, Anthony Sanchez.  Jefferson's excellent talk was entitled "Lets put our money where our ethics are", and was an eye-opening discussion about the state of funding (or lack thereof) for web archiving. The tweets below capture the essence of the presentation, but this is definitely one you should take the time to watch.  Allan's presentation addressed the issues about building "community archives" and being aware of tensions that exist between different marginalized groups. Justin's presentation was excellent, detailing both GWU's collection activities and the associated ethical challenges (including who and what to collect) and the gap between collecting via APIs and archiving web representations.  I believe Anthony and Monique jointly gave their presentation about ethical web archiving requires proper representation from marginalized communities.



The next panel "The Right to be Forgotten", was in Session 7 (recording), and featured Joyce Gabiola (moderator), Dorothy Howard, and Katrina Windon.  The right to be forgotten is a significant issue facing search engines in the EU, but has yet to arrive as a legal issue in the US.  Again, my notes on this session are sparse, so I'm relying on tweets. 


The final regular panel was "The Ethics of Digital Folklore", and featured Dragan Espenschied (moderator) (notes), Frances Corry, Ruth Gebreyesus, Ian Milligan (slides), and Ari Spool.  At this point my laptop's battery died so I have absolutely no notes on this session. 


The final session was with Elizabeth Castle, Marcella Gilbert, Madonna Thunder Hawk, with an approximately 10 minute rough cut preview of "Warrior Women", a documentary about Madonna Thunder Hawk, her daughter Marcella Gilbert, Standing Rock, and the DAPL protests.


Day 3 -- March 24, 2018


Unfortunately, I had to leave on Saturday and was unable to attend any of the nine workshop sessions: "Ethical Collecting with Webrecorder", "Distributed Web of Care", "Open Source Forensics", "Ethically Designing Social Media from Scratch", "Monitoring Government Websites with EDGI", "Community-Based Participatory Research", "Data Sharing", "Webrecorder - Sneak Preview", "Artists’ Studio Archives", and unconference slots.   There are three additional recorded sessions corresponding to the workshops that I'll link here (session 8, session 9, session 10) because they'll eventually scroll off the main page.

This was a great event and the enthusiasm with which it was greeted is an indication of the topic.  There were so many great presentations that I'm left with the unenviable task of writing a trip report that's simultaneously too long and does not do justice to any of the presentations.  I'd like to thank the other members of my panel (AdaShawn, and Ashley), all who live tweeted the event, the organizers at Rhizome (esp. Michael Connor), Documenting the Now (esp. Bergis Jules), the New Museum, and the funders: IMLS and the Knight Foundation.   I hope they will find a way to do this again soon. 

--Michael

See also: Ashley Blewer wrote a short summary of EAW, with a focus on the keynotes and  three different presentations.  Please let me know if there are other summaries / trip reports to add.

Also, please feel free to contact me with additions / corrections for the information and links above.