Friday, November 27, 2015

2015-11-28: Two WS-DL Classes Offered for Spring 2016

Two WS-DL classes are offered for Spring 2016:

Information Visualization is being offered both online (CRNs 29183 (HR), 29184 (VA), 29185 (US)) and on-campus (CRN 25511).  Web Science is being offered for the first time with the 432/532 numbers (CRNs 27556 and 27557, respectively), but the class will be similar to the Fall 2014 offering as 495/595


Tuesday, November 24, 2015

2015-11-24 Twitter Follower Analysis of Virginia University Alumni Associations

The primary goal of any alumni association is to maintain and strengthen the ties between its alumni, the community, and the mission of the university. With social media, it's easier than ever to connect with current and former graduates on Facebook, Instagram or Twitter with a simple invitation to "like us" or "follow me." Considering just one of these social platforms, we recently analyzed the Twitter networks of twenty-three (23) Virginia colleges and universities to determine what, if any, social characteristics were shared among the institutions and whether we could gain any insight by examining the public profiles of their respective followers. The colleges of interest, ranked by number of followers in Table 1, vary in size, mission, type of institution, admissions selectivity and perceived prestige. Each of the alumni associations has maintained a Twitter presence for an average of six (6) years. The oldest Twitter account belongs to Roanoke College (@roanokecollege) which is approaching the eight (8) year mark. The newest Twitter account was registered by Randolph Macon College (@RMCalums) nearly two years ago.

University Followers Joined Twitter
University of Virginia 12,100 11/1/2008
Roanoke College* 9,588 3/1/2008
Regent University* 7,966 11/1/2008
James Madison University 7,865 8/1/2008
Virginia Tech 6,418 4/1/2009
College of William & Mary 4,448 1/1/2009
University of Mary Washington 3,847 10/1/2009
Liberty University 3,699 11/6/2009
University of Richmond 3,299 5/1/2009
Sweet Briar College* 2,523 8/1/2010
George Mason University 2,375 2/1/2011
Hampton University 2,372 2/15/2012
Christopher Newport University 2,191 8/1/2010
Old Dominion University 1,996 7/1/2009
Randolph College* 1,857 8/1/2008
Washington and Lee University 1,842 8/1/2011
Radford University 1,758 3/11/2011
Hampden-Sydney College 1,086 7/1/2009
Longwood University 1,035 2/28/2013
Hollins University 923 4/1/2009
Virginia Military Institute 836 3/1/2009
Norfolk State University 629 8/15/2011
Randolph-Macon College 172 3/7/2014
Table 1 - Alumni Associations Ranked by Followers

* Institution does not have an official alumni Twitter account.
The university Twitter account was used instead.

Social Graph Analysis

NodeXL is a template for Microsoft Excel which makes network analysis easy and rather intuitive. We used this tool for data collection to import the Twitter networks and to analyze the various social media interactions. There are limitations established in the Twitter API which regulate the amount of data collected per hour by any one user. Therefore, due to rate limiting, NodeXL will inherently only import the 2,000 most recent friends and followers for any Twitter account. To improve the response time of the API, we further restricted our data collection to the 200 most recent tweets for both the university and each of its follower accounts.

For our first look at the alumni associations, we clustered the data based on an algorithm in NodeXL which looks at how the vertices are connected to one another. The clusters, as shown in Figure 1, are indicated by the color of the nodes. The clusters themselves revealed some interesting patterns.  The high level of inter-association connectivity, as measured in follows, tweets and mentions, was unexpected. We would have thought that each association operated within the confines of its own Twitter space or that of its parent organization. As we examine the groupings in this network, it is not unreasonable that we would observe connections between Old Dominion University (@ODUAlumni), Norfolk State University (@nsu_alumni_1935) and Hampton University (@HamptonU_Alumni) as all three are located within close proximity of one another in the Hampton Roads area. But, then we must take notice of Hollins University (@HollinsAlum), a small, private women's college in Roanoke, VA, which has a connection with ten (10) other alumni associations; more connections than any other school. Hollins is one of the smallest universities in our group with enrollment of less than 800 students. Since Twitter is primarily about influence, in this instance, we can probably assume the follows serve as a means to observe best practices and current engagement trends employed by larger institutions. While Hollins University is well connected, as are many of the other schools, at the opposite end of the spectrum we find Liberty University (@LibertyUAlum), a large school with more than 77,000 students. Liberty University remains totally isolated with no follower connections to the other alumni associations. You might minimally expect some type of connection with either Regent University (@RegentU) since both share a similar mission as private, Christian institutions or other universities within close physical proximity such as Randolph College (@randolphcollege).

Figure 1 - Connectivity of Alumni Associations

Twitter Followers, Enrollment, and Selectivity

We normally measure the popularity of a Twitter account based on the number of followers. Instead of simply quantifying the follower counts of each alumni association, we sought to understand if certain factors, actions or inherent qualities about the institution might influence the relative number of followers.  First, we considered whether more active tweeters would attract more alumni followers. As shown in Figure 2, the College of William and Mary (@wmalumni) has generated the most tweets over its lifetime, approximately 6,200 or 2.5 tweets per day. But, we also observe the University of Mary Washington (@UMaryWash), which has approximately half the student enrollment, a similar Twitter life span, 50% percent less tweets at 2,800 or 1.3 per day, with only a slight difference in the number of followers, 4,400 versus 3,800 respectively. While the graph shows that schools such as Virginia Tech (@vt_alumni) and the University of Virginia (@UVA_Alumni) have more followers with fewer lifetime tweets, the caveat is that these public institutions have the benefit of considerably larger student populations which inherently increases the pool of potential alumni.

Figure 2 - Lifetime Tweets Versus Followers

Next, we considered whether a higher graduation rate, or alumni production, would result in more followers. We obtained the most recent, 2014 overall graduation rates for each institution from the National Center for Education Statistics, with reported overall six-year graduation rates ranging from 34% to 94%. A 2015 Pew Research Center study of the Demographics of Social Media Users indicates that among all internet users, 32% in the 18 to 29 age range use Twitter. This is a key demographic as we would expect our alumni associations to be primarily focused on attracting recent undergraduates. We also factored in selectivity, a comparative scoring of the admissions process, using the categories defined in the 2016 U.S. News Best Colleges Directory. In this directory, colleges are designated as most selective, more selective, selective, less selective or least selective based on a formula.

As we look at Figure 3, we observe a positive correlation between admissions selectivity and the institution's overall graduation rate. Schools which were least selective during the admissions phase also produced the lowest graduation rates (less than 40%) while schools which were most selective, experienced the highest graduation rates (around 90%).  It isn't surprising that improved graduation rates positively affect the expected number of alumni Twitter followers. We'll leave it as an exercise for the reader to extrapolate how closely each institution's annual undergraduate enrollment, graduation rate and expected level of engagement on Twitter corresponds to the actual number of followers when all three factors are considered.

Figure 3 - Followers Versus Graduation Rate

Potential Reach of Verified Followers

Users on Twitter want to be followed so we looked carefully at who, besides alumni and students, was following each of the alumni associations. Specifically, we noted the number of Twitter verified followers; accounts which are usually associated with high-profile users in "music, acting, fashion, government, politics, religion, journalism, media, sports, business and other key interest areas." In addition to an abundance of local news reporters and sports anchors, regional politicians and career sites, other notable followers included: restaurant review site Zagat (@Zagat), automaker Toyota USA (@toyota), musician and rapper DJ King Assassin (@DjKingAssassin), the Nelson Mandela Foundation (@NelsonMandela), the President of the United States Barack Obama (@BarackObama), Virginia Governor Terry McAuliffe (@GovernorVA) and artist and singer Yoko Ono (@yokoono). It's a safe assumption that some of the follower relationships with verified users were probably established prior to 2013. This is the year in which Twitter instituted new rules to kill the "auto follow" which was a programmatic way of following another user back after they follow you. Either way, the open question would remain as to why these particular users would follow an alumni association when there are no readily apparent educational ties.

Twitter doesn't take follower count into consideration when verifying an account, but it's not unusual for a verified account to have a considerable following. Since the mission of an alumni association is essentially about networking and information dissemination, we also measured the potential reach or level of influence across the followers' extended network obtained from the verified accounts. No single university had more than 70 verified accounts among its followers. However, when we look at their contribution, in Figure 4, as a percentage of the combined reach achieved by all followers of each alumni association, these select users accounted for as little as 1.6% for Virginia Military Institute (@vmialumni) to as much as 95.8% for Longwood University (@acaptainforlife) of the institution's total potential reach (i.e., followers of my followers).

Figure 4 - Potential Reach Percentage of Verified Accounts

Alumni Sentiment

Finally, we examined how each follower described himself in the description (i.e., bio) portion of their Twitter profile by extracting the top 200 most frequently occurring terms for each alumni association. A word cloud for the alumni of each university is shown in Figure 5. If we further isolated the descriptions to the top ten most frequently occurring words, we observed a common pattern among all alumni followers. In addition to the official or some derivative of the institution name (e.g., JMU, NSU, Tech), we find the terms love, life, and some intimate description of the follower as a mom, husband, student, father or alumni.  If the university has an athletic department, we also found mention of sports and, in the case of our two Christian universities, Liberty and Regent, the terms God, Jesus, and Christ were prevalent. In 22 of 23 institutions, the alumni primarily described themselves using these personal terms. Conversely, the alumni followers at only one institution, the University of Richmond (@urspidernetwork), described themselves in a more business-like or academic manner with more frequent mention of the words PhD, career, and job.

Figure 5 - Word Clouds of Twitter Follower Descriptions

-- Corren McCoy

Thursday, November 5, 2015

2015-11-06: iPRES2015 Trip Report

From November 2nd through November 5th, Dr. Nelson, Dr. Weigle, and I attended the iPRES2015 conference at the University of North Carolina Chapel Hill. This served as a return visit for Drs. Nelson and Weigle; Dr. Nelson worked at UNC through a NASA fellowship and Dr. Weigle received her PhD from UNC. We also met with Martin Klein, a WS-DL alumnus now at the UCLA Library. While the last ODU contingent to visit UNC was not so lucky, we returned to Norfolk relatively unscathed.

Cal Lee and Helen Tibbo opened the conference with a welcome on November 3rd, followed by Nancy McGovern's keynote address delivered with Leo Konstantelos and Maureen Pennock. This was not a traditional keynote, but instead an interactive dialogue in which several challenge areas were presented to the audience, and the audience responded -- live and on twitter -- significant achievements or advances in those challenge areas from #lastyear. For example, Dr. Nelson identified the #iCanHazMemento utility. The responses are available on Google Docs.

I attended the Institutional Opportunities and Challenges session to open the conference. Kresimir Duretec presented "Benchmarks for Digital Preservation Tools." His presentation touched on how we can get digital preservation tools that "Just Work", including benchmarks for evaluating tools on test beds and measuring them for quality. Related to this is Mat Kelly's work on the Archival Acid Test.

Alex Thirifays presented "Towards a Common Approach for Access to Digital Archival Records in Europe." This paper touched on user access: user needs, best practices for identifying requirements for access, and a capability gaps analysis of current tools versus user needs.

"Developing a Highly Automated Web Archive System Based
on IIPC Open Source Software" was presented by Zhenxin Wu. Her paper outlined a framework of open source tools to archive the web using Heritrix and a SOLR index of WARCS with an enhanced interface.

Barbara Sierman closed the session with her presentation "Best Until ... A National Infrastructure for Digital Preservation in the Netherlands" focusing on user accessibility and organizational challenges as part of a national strategy for preserving digital and cultural Dutch heritage.

After lunch, I lead off the Infrastructure Opportunities and Challenges session with my paper on Archiving Deferred Representations Using a Two-Tiered Crawling Approach. We defined deferred representations as those that rely on JavaScript to load embedded resources on the client. We show that archives can use PhantomJS to create a 1.5 times larger crawl frontier than Heritrix itself, but PhantomJS crawls 10.5 times slower. We recommend using a classifier to recognize deferred representations and only use it to crawl deferred representations, mitigating the crawl slow-down while still reaping the benefits of the headless crawler.

iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling Approach from Justin Brunelle
Douglas Thain followed with his presentation on "Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness?" Similar to our work with deferred representations, his work focuses on scientific replay of simulations and software experiments. He presents several tools as part of a framework for preserving the context of simulations and simulation software, including dependencies and build information.

Hao Xu presented "A Method for the Systematic Generation of Audit Logs in a Digital Preservation Environment and Its Experimental Implementation In a Production Ready System". His presentation focuses on a construction of a finite state machine to understand whether a repository is following compliance policies for auditing purposes.

Jessica Trelogan and Lauren Jackson presented their paper Preserving an Evolving Collection: "“On-The-Fly” Solutions for the Chora of Metaponto Publication Series." They discussed the storage of complex artifacts of ongoing research projects in archeology with the intent of improving sharability of the collections.

To wrap up Day 1, we attended a panel on Preserving Born-Digital News consisting of Edward McCain, Hannah Sommers, Christie Moffatt, Abigail Potter (moderator), Stéphane Reecht, and Martin Klein. Christie Moffatt identified the challenges with archiving born-digital news material, including the challenges with scoping a corpus. She presented their case study on the Ebola response. Stéphane Reecht presented the work by the BnF regarding their work to perform massive, once-a-year crawls as well as selective, targeted daily crawls. Hannah Sommers provided insight into the culture of a news producer (NPR) on digital preservation. Martin Klein presented SoLoGlo (social, local, and global) news preservation, including citing statistics about the preservation of links shortened by the LA Times. Finally, Edward McCain discussed the ephemeral nature of born-digital news media, and provided examples of the sparse number of mementos in news pages in the Wayback Machine.

To kick off Day 2, Lisa Nakamura gave her opening keynote The Digital Afterlives of This Bridge Called My Back: Public Feminism and Open Access. Her talk focused on the role of Tumblr in curating and sharing a book no longer in print as a way to open the dialogue on the role of piracy and curation in the "wild" to support open access and preservation.

I attended the Dimensions of Digital Preservation session, which began with Liz Lyon's presentation on "Applying Translational Principles to Data Science Curriculum Development." Her paper outlines a study to help revise the University of Pittsburgh's data science curriculum. Nora Mattern took over the presentation to discuss the expectations of the job market to identify the skills required to be a professional data scientist.

Elizabeth Yakel presented "Educational Records of Practice: Preservation and Access Concerns." Her presentation outlined the unique challenges with preserving, curating, and making available educational data. Education researchers or educators can use these resources to further their education, reuse materials, and teach the next generation of teachers.

Emily Maemura presented "A Survey of Organizational Assessment Frameworks in Digital Preservation." She presented the results of a survey focusing on frameworks for assessment models, drawing conclusions like software maturity models do for computer scientists. Further, her paper identifies trends, gaps, and models for assessment.

Matt Schultz, Katherine Skinner, and Aaron Trehub presented "Getting to the Bottom Line: 20 Digital Preservation Cost Questions." Their questions help institutions evaluate cost, including questions about storage fees, support, business plans, etc. to help institutions assess their approach to taking on digital preservation.

After lunch, I attended the panel on Long Term Preservation Strategies & Architecture: Views from Implementers consisting of Mary Molinaro (moderator), Katherine Skinner, Sibyl Schaefer, Dave Pcolar, and Sam Meister. Sibyl Schaefer lead off with a presentation of details on Chronopolis and ACE audit manager. Dave Pcolar followed by presenting the Digital Preservation Network (DPN) and their data replication policies for dark archives. Sam Meister discussed the BitCurator Consortium which helps with the acquisition, appraisal, arrangement and descriptions, and access of archived material. Finally, Katherine Skinner presented the MetaArchive Cooperative and their activities teaching institutions to perform their own archiving, along with other statistics (e.g., the minimum number of copies to keep stuff safe is 5).

Day 2 concluded with the poster session (including a poster by Martin Klein) and reception.

Pam Samuelson opened Day 3 with her keynote Mass Digitization of Cultural Heritage: Can Copyright Obstacles Be Overcome? Her keynote touched on the challenges with preserving cultural heritage introduced by copyright, along with some of the emerging techniques to overcome the challenges. She identified duration of copyright as a major contributor to the challenges of cultural preservation. She notes that most countries have exceptions for libraries and archives for preservation purposes, and explains recent U.S. evolutions in fair use through the Google Books rulings.

After Samuelson's keynote, I concluded my iPRES2015 visit and explored Chapel Hill, including a visit to the Old Well (at the top of this post) and an impromptu demo of the pit simulation. It was very scary.

Several themes emerged from iPRES2015, including an increased emphasis on web archiving and a need to improved context, provenance, and access for digitally preserved resources. I look forward to monitoring the progress in these areas.

--Justin F. Brunelle

Tuesday, October 27, 2015

2015-10-21: Grace Hopper Celebration of Women in Computing (GHC) 2015

On October 13-17, the atmosphere at the George R. Brown Convention Center in Houston, Texas was electric with 12,000 women in tech from all around the world attending the Grace Hopper Celebration of Women in Computing (GHC), the world's largest gathering for women in computing. GHC is presented by the Anita Borg Institute (ABI) for Women and Technology, which was founded by Dr. Anita Borg and Dr. Telle Whitney in 1994 to bring together research and career interests of women in computing and encourage the participation of women in computing. The incredible progress of GHC went from 500 women in technology at 1994 to 12,000 women this year.

I was humbled to receive a scholarship from the ABI to attend GHC 2015. I also was thrilled twice before to attend the GHC 2013 in Minnesota and GHC 2014 in Phoenix. This year, I represented the Computer Science department at Old Dominion University, the ArabWIC organization, as a member of the leadership committee and as a mentor in the academic mentoring sessions, and the ABI organization, in which I volunteered for blogging and taking notes from GHC. You can visit the Grace Hopper Celebration 2015 wiki page for reading more about the sessions note updates.

The conference was filled with exciting lineup of inspiring speakers, panels, sessions and workshops. There were multiple technical tracks: career, emerging tech, general sessions, open source, organizational transformation, and technology (e.g., data science, artificial intelligence, HCI, security, software engineering). Conference presenters represented many different fields, such as academia, industry, and government. The non-profit organization "Computing Research Association Committee on Women in Computing (CRA-W)", also offered sessions targeted towards academics and business. I had a chance to attend Graduate Cohort Workshop in 2013, which was held in Boston, MA, and created a blog post about it.

The first day was kicked off by the amazing and inspiring Telle Whitney, the president and the CEO of the ABI, welcoming the audience. Whitney gave the audience a piece of advice: "talk to almost anyone you pass by in the conference and introduce yourself. It is your time to learn, to join new communities, to reach out people, and offer advice. It is our time to lead". She introduced the featured keynote speakers of the three days of the conference: Susan Wojcicki (the CEO of YouTube), Megan Smith (the first female CTO of the United States), and Sheryl Sandberg (the CEO of Facebook), Manuela M. Veloso (Professor in the Computer Science Department at Carnegie Mellon University), Clara Shih (CEO and Founder of Hearsay Social), Hilary Mason (the Founder of Fast Forward Labs). At the end, Whitney introduced Alex Wolf, the President of the Association of Computing Machinery (ACM) and a professor in the Department of Computing at Imperial College London, UK, for opening remarks.

As the day progressed, the Open Source Day sessions and presentations were talking place. Open Source Day: Code-a-thon for Humanity gives women from around the world the chance to learn how to contribute to the open source community, regardless of their skill or experience level through developing a variety of humanitarian projects. The Open Source Day 2015 page contains more details about the projects.

The Wednesday Keynote by Hilary Mason: "This is the best room in the world !!", this is how Mason started her keynote, which was about machine intelligence research. Mason introduced herself as a data scientist, CEO, software engineer and followed up with "I look like all of those". She talked about the importance of data and mentioned that data products are everywhere. She mentioned many example for different apps that use machine intelligence research: Foursquare, an app from New York city company collect data and based on this data, the app provides recommendations of the places to go around a user's current location and Dark Sky app, which predicts when it will rain or snow. Dark Sky app was built on the top of government weather data. It may be not interesting for a Californian, but it is interesting for the rest of people :-).

Mason talked about how she become passionate about data science. She defined a data scientist as a professional role to combine multiple capabilities: math, statistic, coding ability to build infrastructure, and communication domain knowledge, everything they need to know to go to talk someone who has a problem. A data scientist works on analytical problems. She said technology is changing rapidly, and people's adaption of technology is growing faster. One of the interesting parts of her talk was about predicting the future. She said, "predicting the future is hard", then showed a picture for people from the past imagining the future.

At the second part of the talk, she talked about her company, Fast Forward labs, which started in 2014, to introduce a new method for applied research. They focus on innovation opportunities through data and algorithms. FF sits in the middle of three communities: established companies, startups, and academic research. What makes a machine intelligence technology interesting?
  1. A theoretical breakthrough 
  2. A change of economics 
  3. A capability becomes a commodity (ex: Hadoop
  4. a) Wikipedia: new data is available b) data is made useful 
Mason ended her talk with thanking everyone who helped her, then she gave the audience a piece of advice: "If you are at the beginning of your career and you are thinking of where you might end up, you need to know that my first GHC was in 2002, and I was a shy quiet student who mostly sit in the back in every talk and shy to ask a question. But it is amazing to be in this room today with so many people who have affected my career".

At the end of the keynote, the 2015 Technical Leadership ABIE Award was given to Lydia E. Kavraki, the Noah Harding Professor of Computer Science and professor of Bioengineering at Rice University.

After the keynote, I attended the "CRA-W Early Career: The Tenure Process" session by Julia Hirschberg from Columbia University and Joan Francioni from Winona State University. The session and tenure process, i.e., research, teaching, service, expectations of department, annual reviews, letter writers, and the typical process. The speakers gave advice and tips on understanding the requirements/expectations of your institution, such as, have an overall teaching plan/goals, do not be hard or too easy. They also gave tips regarding collaboration: the successful collaboration is a multiplier; you can achieve more than you can on your own and the unsuccessful collaboration can be a negative multiplier; waste times, stressful, creates hard feelings.

The panel of Global Women Technical Leaders Program 
Next, I attended the "Global Women Technical Leaders Program: After the Grace Hopper Celebration: Building and Sustaining Community" panel in the career track. The panelist were Josephine Ndambuki of Safaricom ltd Rosario Robinson of Anita Borg Institute, Alaa Fatayer of JawwalSana Odeh of NYU and ArabWIC and moderated by Arezoo Miot of TechWomen. Sana introduced the panel and thanked ABI and the panelists for their support and for increasing the women in tech communities. Rosario talked about her journey. She said she was the only woman in mathematics. The panel discussed the essentials of building a community to support women in technology. Alaa talked about her experience starting with tech women to building a community in Palestine. Some of the addressed questions were: How and where do you start in creating a community? What programs are out there to support technical women? Overcoming obstacles in creating local communities. How can we develop allies in our communities? At the end, Rosario gave the following advice: "be clear about what you are and what you do".

A panel by directors from Apple in the scholars lunch 
The scholars lunch: Suzanne Mathew, an Assistant Professor of Computer Science at the United States Military Academy, introduced the panel of three amazing ladies from Apple. The panel was by Esther Hare, a Director in Worldwide Developer Marketing team and Maryam Najafi, a Director of UX, and moderated by Karen Sipprell, the VP Marcom at Portal Software. The scholar lunch was sponsored by Apple, in which many scholars get together with discussions on tables and each table has one women who has a role in Apple. The number of scholars in GHC15 are 500 out of 2,000 applicants as Professor Nancy Amato from Texas A&M University announced. The panel by three seniors ladies from Apple handled many interesting experience by each one of them. Here are some advice from the panelist:
  • Be around as much as you can, the more you get around the more opportunities you will find. 
  • Find you passion, so you can solve problems. 
  • Go out and solve problems that freaks you out. 
The lunch ended with the fun part, seven Apple watches for seven lucky women who found animal stickers on the bottom of their chairs!

After lunch I had to work on some stuff for the ABI blogging and social media activities. I also communicated with many amazing women during the conference.

The Wednesday Afternoon Plenary: We had three TED style talks on "Transforming the Culture of Tech" by Clara Shih, the CEO and Founder of Hearsay Social, Blake Irving, the CEO and Board Director of GoDaddy, and the amazing Megan Smith, the Chief Technology Officer (CTO) of the United States of America.

The afternoon plenary speakers
Clara Shich mentioned that she attended GHC for the first time in 2004 when there were 800 attendees at GHC. She told the audience about her journey in the past decade, starting from a student to software engineer, then project manager, to being CEO and Founder of Hearsay Social. Shich shared with the audience the lessons she learned through her journey: 1) Listen carefully, 2) Be ok with being different 3) Cherish relationship above all and help other women. 4) There is no failure, only learning 5) The future is on us, because if not us then who? when, if not now? "if people just sat back 11 year ago, GHC would not be 12,000 today!". Every time we decide to lift a woman up we lift all women up.

Blake Irving talked about how he closed the gender gap at the company since he took over as CEO two years ago and mentioned the solid progress in the ratio of women in GoDaddy. Since last year's GHC, GoDaddy has more than doubled the number of women interns and graduate hires. Blake talked about payment equality and showed many graphs based on data of GoDaddy. "If you are a leader of tech company, be vulnerable again and again. Do not hide your problems. Go public with your diversity statistics, publish your salary. Seek change from the top and bottom. Do the research, find your issues. Surround yourself with people that will challenge you," Blake said, "bad things live in the dark, bad things die in the light."

Megan Smith with the President tech team showing
 the Declaration of Sentiments
"It is great to be back to my people!", this is how the amazing Megan Smith started her talk. Before mentioning the highlight of Megan Smith talk, I would like to highlight her amazing job during the conference to encourage and inspire the attendees by talking to them by herself. This lovely inspiring woman passed by the community booths at the career fair and allow people to talk to her personally and take pictures with her. She also was creative in showing some of the federal tech projects nowadays and bringing many ladies in tech from the president team. At the beginning, Smith talked about her new a role as a CTO of the USA, in which she serves as assistant to the President through advising him and his team on how technology policy, data and innovation can advance our future as a nation. She described the people in the federal government as so passionate, mission driven, and extraordinary.

GHC archive that was found in the previous Thanksgiving 
Smith mentioned that they found GHC archives in the previous Thanksgiving. She talked about many projects they are working on, such as, Innovation Nation, Active STEM Learning, Police Data Initiative. She described the President as "an incredible leader, so smart, so technical, science tech president, and he opens the doors for us to innovate”. Smith introduced many amazing young ladies from the president tech team, who talked about their different roles to serve the nation.

At the end, Smith talked about Declaration of Sentiments, a document signed by 103 of people in 1848 (68 women and 32 men) at the first women's rights convention to be organized by women in Seneca Falls, New York. The document is missing and they are looking for it with many archivist using the #FindTheSentiments.

There was a short discussion at the end with the three speakers about why changing is hard and what strategies are working for them.

In the meantime, the career fair, in which many famous companies, such as Google, Thomson Reuters, Facebook, Microsoft, IBM, etc., were there for hiring talented woman in tech as much as they can, and the community fair, which is a dedicated with in the Expo for attendees to interact with GHC communities, such as the BlackWIC and ArabWIC. The ABI booth was at the center of the Community Fair, where I met the amazing Telle Whitney and talked to her many times. The career fair was the place for anyone who wants to apply for job opportunities at all levels across industry and academia. Each company in the career fair has many representatives to discuss the different opportunities they have for women. A few men also attended the conference. The companies were very creative in advertising themselves.

Megan Smith at ArabWIC booth in the community fair 

The amazing Megan Smith passed by ABI community booths and stopped by the ArabWIC booth. We had a great chance to talk to her personally and take a look at the Declaration of Sentiments closely. She left us with encouragement and inspiration for leading communities and attract more women in tech!

At the end of the first day, I attended the ArabWIC reception, which was sponsored by the Qatar Computing Research Institute (QCRI). We had many new Arab ladies in computing and non-Arab women as well. We exchanged our bios and how each one of us is contributing to serve the women in technology.
The Thursday Keynote had two speakers: Susan Wojcicki, the CEO of YouTube and Hadi Partovi, the CEO and Cofounder at "I’m feeling that I’m really the talking guy in the room," Hadi Partovi said in the beginning of his talk. He shared with the audience his personal story that changed his life; when his dad brought a computer that did not have any games on it, and a book for Hadi to learn so he could write his own games. He talked about Hour of Code, a non-profit bootstrapped project that started in 2013 to expand access to computer science in schools. has support from both Democrats, Republicans, and many celebrities (e.g., President Obama, Bill Gates, Mark Zuckerberg). has trained 15,000 teachers to teach computer science this year, reaching 600K students (43% female)

The Hour of Code
Partovi insisted that his main goal for is not teaching kids how to code, it is teaching kids computer science. He claimed that CS education is on the recovery after many years of declines and there is a problem in CS. He also mentioned that about 9 out of 10 parents want their children to learn CS. I started already with my 7 years old and he was so excited to start his first code :-). Partovi claimed that the gender gap started at K-12; "Almost 70% of the high school kids do not have access to the computer science field. When kids go to school every kid learn about how electricity works or the basic math equations. In the 21th century, it is equally foundational to learn how algorithm work or how the internet works”. Partovi continued that “the school system can evolve to tech kids computer science field. Over 70 schools have embraced CS, including NY, Houston, Chicago, etc". Regarding to the diversity, Partovi asked if we can change the stereotype without changing the facts on the ground. He commented that the way to change the stereotyping is the Hour of Code, which has now 300 partners from 196 countries and 150,000 teachers. At the end, Partovi asked all the audience to help to get more volunteers. To encourage the people to get involved, Microsoft and Amazon will give away gift cards to any teacher who will organize Hour of Code.

After Partovi's talk, 2015 Grace Hopper Celebration Change Agent ABIE Award Winners, Maria Celeste Medina from Kenya and Mai Abualkas Temraz from Palestine were announced. The Award winners gave short inspiring talks about their journey to lead women in technology and how they started.
Susan Wojcicki described the conference as a lifeline where women come together, learn, feel supported, be a computer scientists, and be ourselves. She started her speech with a story about her girl who told her she hated computers, although she used to go to Google since she was born. Susan talked about the serious impact of leaving the girls out of conversation when it comes to technology. "Girls think that technology is insular and anti-social. By 2020, jobs in computer science are expected to grow nearly two times faster than the national average, totaling nearly 5 million jobs. Technology is revolutionizing almost every part in our lives. Every car today has more computing technology than Apollo 11 that first landed on the moon. Yet, today women hold only 26% of all tech jobs. The fact that women represent small portion of tech work force is not just a wake up call, it is a 'Sputnik 'moment. It risks future competitiveness,” Susan said "If women don't participate in tech, with its massive prominence in our lives and society, we risk losing many of the economic, political and social gains we have made over decades." Susan continued that the female representation in Tech is a problem and it is getting worse. The women in tech representation was better in the 80s. Susan Wojcicki shared an exclusive teaser of the Codegirl movie, directed by Lesley Chilcott, the Oscar winning film producer.

She talked about balance between family and work. She had her baby 5 months after she joined Google. The constraints of family (for example, how it is tough for kids to be the last one who are picked up from day care) enabled her to develop a work style that focus on efficiency, productivity, prioritization, and to do that at the office hours. She mentioned a Harvard study that shows that employees who take breaks from work have higher level of focus compared to those who do not. Furthermore, employees who feel encouragement by their bosses to take breaks are 100% more loyal to their employers.

Susan Wojcicki is the first one to take maternity leave in Google, and she the only person to take five maternity leaves at Google. Interestingly, each leave enriched her life and left her with peace of mind and gave her a chance to reflect on her career. A generous maternity leave increases retention. When women are given short maternity leave and they are under the pressure of having a call, they quit. When Google increased its paid family leave from 12 to 18 weeks, the rate at which mothers quit fell by 50%. 88% of women in USA are not given family leave. Susan said, "men don't get asked how they balance it all". Susan's daughter now loves computer science. She enrolled her in a computer camp that are for girls, afterward she sketched a computer watch that has her friends contacts and info, before Samsung and Apple came up with their watches.

At the end, Susan insisted that we have to make it our personal responsibility to show the next generation of girls that they belong to the world of computer science.

Advice from Susan:
  • We need to give everyone a chance to understand computer science. 
  • Make computer science available to everyone in the USA by making it mandatory. 
  • Focus on working smart. Work smart, work hard. Do a great job, but then GO home.
  • Keep asking, look out for yourself, be an advocate and do not feel guilty about it! 
  • For tech companies, you need to help employees to find balance between work and family.
  • Tech companies need to pay generous maternity leave. 
  • A step back helps sometime.
  • If you work for a company and you feel you can not work a balanced day and the maternity leave is bad, I recommend that you leave and search for a supportive company and by the way, we are hiring! 

The Thursday Afternoon Plenary: Thursday Afternoon Plenary was a conversation between Sheryl Sandberg, Facebook CEO and and author of best-selling book Lean In and Nora Danzel, Board Director of Ericsson, AMD, and Outerwall (makers of Redbox, Coinstar and ecoATM) about "What it means to be an effective leader and why it is so important to have women at the table to create technology". Sheryl shared her story about being a keynote speaker in GHC. The conversation handled gender diversity in technology and the pay gap. Sandberg asked the audience to negotiate regarding to payment equality. She talked about Lean In book and Lean In circles and how mentoring is important. She advised the audience to join Lean In circles. Sandberg said, "Starting a Lean In circle is a great leadership opportunity". To read more about the conversation, here is a nice article:
Sandberg: Tech offers the best jobs, needs more women voices, and women need to stick with it

I attended the "Change Agent and Social Impact Awards” session by the ABI award winners: Michal Segalov of Mind the Gap, Maria Celeste Medina of Ada IT, Daniel Raijman of Mind the Gap, Mai Abualkas Temraz of Gaza Sky Geeks.

The moderator had a conversation with the ABI award winners to draw out their stories. The winners talked about the turning points in their life and what continues to motivate them to make a difference. The moderator asked the panelists about the challenges they faced, the turning points in life, and what motivates them.
Daniel said they started Mind The Gap 8 years ago to expose many girls to computer science. They have interacted with 10,000 girls. Mind The Gap expanded globally and is now in its 8th year, with more than 10,000 participants to date.

Michal said that they cared the most about making Mind The Gap scalable. Mind The Gap offers the people to choose how to give/volunteer. For example, some people can provide tech classes, some other can give talk, etc. They had about 100 people volunteered and each volunteer only give one hour of their time per month, so that makes it easy for the people and encourage them to volunteer. Michel advice was to be open to changing things, yourself, and your passion.

María mentioned that her mom encouraged her and support her the most. In one year, Maria has worked with the Programá Tu Futuro team and has initiated more than 6,000 people in coding: kids, adults, teenagers and senior citizens (of which 30% are women). She said that there is also of studies to how to empower woman.

Mai from Gaza was talking over Skype because she could not attend for political reasons. Mai was asked for some fun facts, but she said that she is not in a good status because she could not make it the conference, which made it hard to mention fun facts. In 2014, she became a TechWomen Emerging Leader. She also encouraged everyone to help and support them, and also keep inviting them, so may be in one day they will be able to attend. Mai said they face a lot of challenges in Gaza, but she like to call them opportunities to learn and get more powerful in solving problems they face. At the end, Mai said, I’m kept motivated by events like this where I’m exposed to the global women’s tech community. My goal here is to bring back as much of your energy as I can to Gaza. You can come mentor in Gaza. She mentioned many examples for people who went to Gaza before for mentoring: Angie Chang, the founder of Women 2.0, Dave McClure, the Founder of 500 Startups, and many others. "Don’t worry, it’s safe," Mai Said "or you can mentor women in Gaza remotely." Mai is a member of ArabWIC as well.

Thursday speed mentoring sessions took place during the lunch table on Thursday and Friday. I joined mentoring discussions around academic careers. It was useful to hear from many senior women in academia about their career journey and also hear some questions about applying in academia.

At the career fair, I was lucky to meet Sinead Borgersen, a Principal HR Business Partner at CA Technologies and Dr. Michele Weigle's friend. We had a quick discussion about the careers in CA Technologies and how they will fit with my interest. Siena is an amazing lady who is full of enthusiasm.

The Friday Keynote: Friday morning started with a cool technical keynote on "Robotics as a Part of Society" by Manuela Veloso, Herbert A. Simon University Professor, Computer Science Department, Carnegie Mellon University. Manuela has become well-known in the AI community for being the guiding force behind robot soccer. In her keynote, Manuela highlighted different perspectives of robots in collaborative network of robots and humans. Manuela talked about CoBots, the robots she and her students created to help them with simple tasks in their offices and labs. There robots can use the internet or send emails to ask for help. She showed that autonomous robots learn from interacting with humans. "Technology is about diversity, "Manuela said. "You don’t have to do everything, but some do things that others can’t."

At the end of the keynote, there were announcements about the Grace Hopper 2016. The GHC 2016 will take place in Houston, Texas. The general program co-chairs for GHC 2016 will be Kaoutar El Maghraoui, from IBM Research and the ArabWIC and Maria Gini from University of Minnesota. I spent most of the time on Friday at the career fair, then I attended the mentoring session on ArabWIC lunch table and met many women in computing from different fields.

The Friday Afternoon Plenary: The day wrapped up with an afternoon plenary session focused on the importance on diversity in technology by Janet George, Chief Data Scientist for Big Data/Data Science and Cognitive Computing at SanDisk, Isis Anchalee, Platform Engineer at OneLogin, Miral Kotb, Director, Producer, Choreographer and Playwright for iLuminate.

I couldn’t attend the afternoon keynote, but I heard from many friends about iLuminate, which is a wearable lighting system that enables novel dance act, performance, in which the audiences were treated with at the end of the conference. For more about the afternoon plenary, here are nice wrap ups for the three talks:
GHC 2015 ended with busting a move on the dance floor in a night to remember at the Minute Maid Park. There were many photos booths, t-shirts, glowing sticks, and dessert. It is a Grace Hopper Celebration, after all!

It was fascinating to be in GHC 2015 to hear from the most talented and inspiring women in technology and get advice from them. Furthermore, spending the best time with many awesome ladies and get back with many friends who support each other. I also was glad to be involved in many activities this year for the ABI community and the ArabWIC.


Wednesday, October 7, 2015

2015-10-07: IMLS and NSF fund web archive research for WS-DL

In the spring and summer of 2015, the Web Science and Digital Libraries (WS-DL) group has received a total of $950k of funding from the IMLS and the NSF to study various aspects of web archiving.  Although previously announced on twitter (IMLS: 2015-03-31 & NSF: 2015-08-25), here we provide greater context for how these awards support our vision for the future of web archiving*.

Our IMLS proposal is titled "Combining Social Media Storytelling With Web Archives" and a PDF of the full proposal is available directly from the IMLS.  This proposal is joint with our partners at Archive-It and is informed by our experiences in several areas, such as:
Our most illuminating insight (somewhat obvious in retrospect) is to not try to include all of the collection's holdings in its summarization, but to only surface the exemplary components sufficient to distinguish one collection from the next.  One example we frequently use is "how do we distinguish the many `human rights' collections available in Archive-It?"  They all have different perspectives, but they can be difficult to navigate for those without detailed knowledge of the seed URIs and the collection development policy. 

The IMLS proposal will investigate two main thrusts:
  1. Selecting a small number (e.g., 20) of exemplary pages from a collection (often 100s of archived copies of 1000s of web pages) and loading them in an existing tool such as Storify as a summarization interface (instead of custom & unfamiliar interfaces).  Yasmin AlNoamany has some exciting preliminary work in this area; for example see her TPDL 2015 paper examining what makes a "good" story on Storify, and her presentation "Using Web Archives to Enrich the Live Web Experience Through Storytelling".
  2. Using existing stories to generate seed URIs for collections.  One problem for human-generated web archive collections is that they depend on the domain knowledge of curators.   For example, the image above shows two Storify stories about early riots in Kiev (aka Kyiv) which predated much of the exposure in Western media and then the subsequent escalation of the crisis.  The collection at Archive-It was not begun until the annexation of the Crimea was imminent, possibly missing the URIs that document the early stages of this developing story.  Our idea is to mine social media, especially stories, for semi-automated, early creation of web archive collections. 
The NSF proposal is titled "Increasing the Value of Existing Web Archives" and represents a shift in how we think about web archiving.  One point we've made for a while now (for example, see our 2014 presentation "Accessing the Quality of Web Archives") is that we must shift our current focus of simply piling up bits in the archive to more nuanced questions of how to make the archives more immediately useful (as opposed to just insurance for future loss) and to how to assess & meaningfully convey the quality of the archived page.  This proposal will have three main research thrusts:
  1. Inspired by Martin Klein's PhD research and Hugo Huurdeman et al.'s "Finding Pages on the Unarchived Web" from JCDL 2014, we would like to see archives provide recommendations of related pages in the archive, as well as suggested "replacements" for pages that are not archived.  Web archives now just return a "yes" (200) or "no" (404) when you query for a URI -- they should be able to provide more detailed answers based on their holdings.
  2. We'd like to further investigate the various issues of how well a page is archived.  We have some preliminary work from Justin Brunelle for automatically assessing the impact of missing embedded resources (typically stylesheets and images), as well as from Scott Ainsworth on detecting temporal violations -- combinations of HTML and images that never occurred on the live web (see "Only One Out of Five Archived Web Pages Existed as Presented" from HT 2015).  
  3. Related to #2, we need to find a better way to visualize the temporal & archival makeup of replayed pages.  For example, the LANL Time Travel service does a nice job of showing the various archives that contribute resources to a reconstruction, but questions remain about scale as well as describing temporal violations and their likely semantic impact.  Similarly, we'd like to investigate how to convey the request environment that generated the representation you're viewing now (see our 2013 D-Lib paper "A Method for Identifying Personalized Representations in Web Archives" for preliminary ideas on linking various geoip, mobile vs. desktop, and other related representations). 
We have been very fortunate with respect to funding in 2015 and we look forward to continued progress on the research thrusts outlined above.  We'd like to thank everyone that made these awards possible.  We welcome any feedback or interest on these (and other) projects as we progress.  Watch this blog and @WebSciDL for continued research updates.


* = See also our 2014 award for $324k from the NEH for the study of personal web archiving and our 2014 award for $49k from the IIPC for profiling web archives for a more complete picture of our research vision for web archives.

Wednesday, September 30, 2015

2015-09-30: Digital Preservation - Magdeburg Germany Trip Report

Dr. Herzog: This large green area on your left is Sanssouci Park. It has 11 palaces in it.
Yasmin: I want to visit this park after we are back from the university, can we?
Dr. Herzog: We sure can... I think we will be back before sunset.
Yasmin: I love beautiful things.
Dr. Herzog: Who doesn't?
Sawood: [Smiles]

The three souls were heading to the Hochschule Magdeburg-Stendal University from Potsdam, Germany in Dr. Michael Herzog's car for a lunch lecture on the topic of Digital Preservation. Yasmin and Sawood from the Web Science and Digita Libraries Research Group of the Old Dominion University, Norfolk, Virginia were invited for the talk by Dr. Herzog at his SPiRIT Research Group. The two WSDL members have presented their work at TPDL 2015 in Poznan, Poland then on their way back home they ware halted and hosted by Dr. Herzog in Germany for the lunch lecture. You may also enjoy the TPDL 2015 trip report by Yasmin.

Passing by beautiful landscapes, crossing bridges and rivers, observing renewable energy sources such as windmills and solar panels, and touching almost 200 km/h speed on the highway we reached to the university in Magdeburg. Due to the vacations there were not many people in the campus, but the canteen was still crowded when we went there for the lunch. Dr. Herzog's student, Benjamin Hatscher (who created the poster for the talk) joined us for the lunch. Then we headed to the room that was reserved for the talk and started the session.

Dr. Herzog briefly introduced us, our research group, and our topics for the day to the audience. He also shared his recent memories about the time he spent at ODU and about his interactions with the WSDL members. Then he left the podium for Yasmin.

Yasmin presented her talk on the topic, "Using Web Archives to Enrich the Live Web Experience Through Storytelling". She noted that her work is supported in part by IMLS. She started her introduction with a set of interesting images. She then illustrated the importance of the time aspect in storytelling and described how storytelling looks like on the Web, and especially on the social media. She discussed the need of selecting a very small, but representative subset from a big pile of resources around certain topic to tell the story. Selecting the small representative subset is challenging, but important task. This gives a brief summary as well as the entry point to deep dive into the story and explore remaining resources. She gave examples of how Facebook Lookback compiles a few highlights from hundreds or thousands of someone's sharings and 1 Second Everyday for storytelling. Then she moved on to the popular social media storytelling service Storify and described the issues in it such as flat representation, bookmarking not preservation, and resources going off-topic over time. This lead her to the description of the Web archives, Memento, and Web archiving services (mainly Archive-It). Then she described the shortcomings of the Web archiving services when it comes to storytelling and how it can be improved by combining the Web archives and the storytelling services together. After that she concluded her talk by describing her approaches and policies on selecting the representative subset of resources from a collection.

I, Sawood Alam presented my talk on the topic, "Web Archiving: A Brief Introduction". I briefly introduced myself with the help of my academic footprint and the lexical signature. The "lexical signature" term led me to touch on Martin Klein's work and how I used it to describe a person instead of a document. Then I followed the agenda for the talk and began with the description of the archiving in general, the concept of the Web archiving, and the differences between the two.

I then briefly talked about the purpose and importance of the Web archiving on institutional and personal scales. Then I described various phases and challenges involved in the Web archiving such as crawling, storage, retrieval, replay, completeness, accuracy, and credibility. This gave me opportunity to reference various WSDL members' research work such as Justin's Two-Tiered Crawling and Scott's Temporal Violations. Then I talked about existing Web and digital archiving efforts and various tools used by Web archivists in various stages. The list included vastly used tools such as Heritrix, OpenWayback, and TimeTravel as well as various tool developed by WSDL members or other individual developers such as CarbonDate, Warrick, Synchronicity, WARCreate, WAIL, Mink, MemGator, and Browsertrix. After that I briefly described the Memento protocol and Memento aggregator.

This lead me to my IIPC funded research work on Archive Profiling. In this section of the talk I described why archive profiling is important, how it can help in Memento query routing, and how does an archive profile look like.

To motivate the audience for research in the Web archiving field I discussed various related areas that have vast research opportunities to explore.

Then I concluded my talk with the introduction of our Web Science and Digital Libraries Research Group. This was the fun part of the talk, full of pictures illustrating lifestyle and work environment at our lab. I illustrated how we use tables in our lab for fun traditions such as bringing lots of food after a successful defense or spreading assignment submissions on the Ping Pong table for parallel evaluation. I illustrated our effective use of the white boards from "about:blank" state to the highly busy and annotated state and the reserved space for the "PhD Crush" that keeps track of the progress of each WSDL member in a visual and fun way. I couldn't resist to show our Origami skills on the scale of covering an entire cubicle and every single item in it individually.

After a brief QA session, Dr. Herzog formally concluded the event.

From there we all were free to explore the beauty of the places around and we did to the extent possible. We toured around the historical places of the Magdeburg city such as the Gothic architecture masterpiece, Magdeburg Cathedral and on our way back to the Potsdam we saw the newly built largest canal under-bridge, Magdeburg Water Bridge.

By the time we reached to Postdam the sun was already set, but we still managed to see a couple of the palaces in the Sanssouci Park and they were looking beautiful in that light condition. We even managed to take a few pictures in that low light.

Dr. Herzog invited us for dinner at his place and we had no reason or intention to say no. He was the head chef in his kitchen and prepared for us a delicious rice recipe and white asparagus (which was a new vegetable for me). Since I like cooking, I decided to join him in his kitchen and he gladly welcomed me. I did not have any plans in advance, but after a brief look inside his fridge I decided to prepare egg hearts and salad. During and after the dinner Dr. Herzog described and showed pictures of many historical places in Potsdam and made us excited to visit them the next day.

The next morning we had to head back to Berlin, but we sneaked a couple of hours in the morning to see the beauty of the Sanssouci Park and the Sanssouci Palace in the bright sunlight. A long series of stairs from the front entrance of the palace leading to the water fountain with stepped walls on both the sides covered with grapes vines were mesmerizing.

Dr. Herzog dropped us to the train station (or Bahnhof in German) from where we took train for Berlin. We got almost a day to explore Berlin and we did it the extent possible. It is an amazing city, full of historical masterpieces and the state of the art architecture. At one point, we got stuck in a public demonstration and couldn't use any transport due to the road jam, although, we had no idea what was that demonstration for.

Later in the evening Dr Herzog came to Berlin to pick his wife up from the Komische Oper Berlin where she was performing an Opera and we got a chance to look inside this beautiful place. This way we got a few more hours to have a guided tour of Berlin and had dinner in an Italian restaurant.

It was a fun trip to explore three beautiful cities of Germany immediately after exploring yet another beautiful and colorful city of Poznan, Poland. We couldn't have imagined anything better than this. I published seven photo spheres of various churches and palaces on Google Maps during this trip and got an album full of pictures.

On behalf of my university, department, research group, and myself I would like to extend my sincere thanks and regards to Dr. Herzog for his invitation, warm welcome, hosting, and spending time while showing us the best of Magdeburg, Potsdam, and Berlin during our stay in Germany. He is a fantastic host and tour guide. Now tuning back to the see off conversation among the three.

Sawood: Yasmin, now you know why Dr. Herzog said, "who doesn't" when you said, "I love beautiful things".
Yasmin: [Smiles]
Dr. Herzog: [Smiles]

Sawood Alam