Week 9: Network Visualization Insights & Final Summary Report


This week I had the chance to explore a bit more with Gephi, derive some insights, explore referrer traffic from Google Analytics, start writing my final summary report, and scrape data from the web.

Additional Gephi Visualizations and Insights

I added some additional attributes to the network map, one of which was Publication Title Subject Area. This allowed me to look at the types of journals both DataONE articles are published in as well as their citing articles. The visual below is the full network color coded by publication title subject area, sized by impact score 2, and grouped/connected by shared citing articles. There are 2 main clusters indicating two main domains of knowledge and the dual nature of the DataONE articles being published. The majority are published in Life Sciences & Biomedicine (pink) and Technology (green) journals. Blue represents the social sciences field which is scattered throughout the network indicating some crossover into other research areas. Brown represents multidisciplinary sciences journals but the articles tend to be focused on Life Sciences & Biomedicine. Articles from the Physical Sciences (orange) and the Arts & Humanities (dark pink) journals are scarce and don’t seem to yield significant contribution to the clustering and the main network.

Within the set of cited articles there are clearly 5-10 that have a larger impact and dominant contribution in the field. These articles could serve as gatekeeprs to other DataONE articles since they share the most citations with other articles (i.e., the nodes that have the most connections). The articles with the highest impact score are listed below:

  • Big data and the future of ecology
  • Ecoinformatics: Supporting ecology as a data-intensive science
  • Challenges and Opportunities of Open Data in Ecology
  • Data sharing by scientists: Practices and perceptions
  • Altmetrics: Value all research products
  • Data reuse and the open data citation advantage
  • The current state of citizen science as a tool for ecological research and public engagement
  • Next Steps for Citizen Science
  • Center for Trustworthy Scientific Cyberinfrastructure – The NSF Cybersecurity Center of Excellence: Year two report

Satellite Clusters

There are several “satellite clusters” that are not connected to the main network (see figure below). This shows articles that often fall into the same research area as the main area of the network (since the have the same color) but do not share many of the same citing articles (since they do not have many shared edges or links in the network). This could indicate the presence of disparate subfields within the main fields covered. Satellite clusters could indicate areas for publication in the future. Here are some satellite clusters of note:

  • Grand challenges for archaeology- Categorized as Technology since it’s published in Ecological Informatics but has been cited by several social sciences articles suggesting expansion outside of the field
  • Altmetrics: Value all research products- Published in 2018 and already has 165 citations and an Altmetrics score of 334
  • Publishing frontiers: The library reboot- Published in 2018 and has 17 citations and an Altmetrics score of 386 
  • “Ten Simple Rules for Digital Data Storage” is published in Marine Fisheries Review categorizing it under Life Sciences & Biomedicine but showing connections to the technology side of the network
    • This could indicate the benefits of using general titles if trying to expand outside of a certain field (for those that search in multidisciplinary databases)

Sorting by Year

The visual below represents the network map color code by year. There are a lot of citing articles published during 2017 and 2018 that cite older articles. This could indicate a boon in discovery of DataONE articles as citing articles tend to be the most during the years surrounding publication of the cited article.

I also took a dive into the Google Analytics data of the dataone.org website to try to determine the types of websites that are linking to DataONE. Listed below is the break down of the main website types:

Source Type Users Sessions
College/university 17054 19533
Non-profit Organization 15906 18660
Library 15863 17975
Social media 8908 11372
Search engine 9658 10899
Webinar/presentation 2947 4716
Government site 3358 4630
News 1958 2252
Software 1725 2077
Email 2136 1433
Database/repository 977 1164
Publisher/conference 815 1038
Forum 795 896
Blog 655 710
Docs 78 405
Translate 286 294

I am finishing up my summary report and will add a section of insights to this blog post once it’s completed. I also am still working on scraping mentions of DataONE on the web. Luckily I found a great tool called Octoparse that I have been using to scrape data from Bing. Unfortunately, I could not find a consistent way to scrape Google search results as I kept getting barred by Captchas and other hindrances. Octoparse is working away as I write this having scraped 5,000 pages of search results so far. Hopefully it will be completed by tomorrow. I look forward to sharing with you the results of that endeavor as well in the near future!

Leave a Reply

Your email address will not be published. Required fields are marked *