Over the last 10 weeks I have moved this project forward in substantial ways while learning several new tools.
Things I have completed:
- An in depth search of 4 datasets during my first week: PANGAEA 21, PANGAEA 31, PANGAEA 49, and PANGAEA 59
- During my second week i traveled to Bellingham, WA to meet with my mentor, Heather Piwowar, where we developed the data collection plans to make this project manageable during my internship
- We decided to search for data reuse in two ways: 1) Searching full-text for the repository assigned dataset identification number using Google Scholar; and 2) Searching articles that cite the data collection article for dataset reuse. These citing articles were collected using ISI Web of Science. I searched every article retrieved through Google Scholar, and a randomized stratified subset of 150 articles retrieved from Web of Science, as this is what time allowed. These tags were applied to each article.
- I completed both types of data collection for all 10 repositories chosen. Links to Google Spreadsheets for each repository including the dataset ID’s included, the number of citing articles, the Google Scholar Search terms, and number of Google Scholar search results can be viewed in this blog post. All of the tags applied and the citations of the articles analyzed can be viewed in our Mendeley groups.
- We also had a poster accepted to the ASIS&T conference to be held in New Orleans in October. I created our poster for this conference.
Things I have learned during this internship:
- How to use Dropbox for data sharing
- Improved knowledge of Google Documents for collaborative work
- Proficiency with Mendeley, and amazing bibliographic control and social networking tool
- Exposure to a wide range of scientific journals, articles, and citation styles
- Increased knowledge of maintaining an Open Science Notebook and using WordPress
- How to create an effective poster
Summary of data collected so far:
I put together this Google Spreadsheet which is an overview of data reuse based on the data collection so far. This raw data does not take into consideration articles that were labeled “low confidence” which will not be included in the final results and it does not extrapolate the Web of Science results beyond the 150 articles that were analyzed for each repository.