{"id":185,"date":"2011-06-22T17:33:42","date_gmt":"2011-06-22T22:33:42","guid":{"rendered":"http:\/\/notebooks.dataone.org\/tracking1000datasets\/?p=185"},"modified":"2013-05-09T01:17:13","modified_gmt":"2013-05-09T01:17:13","slug":"thoughts-on-the-processresults-so-far","status":"publish","type":"post","link":"https:\/\/notebooks.dataone.org\/data-reuse\/thoughts-on-the-processresults-so-far\/","title":{"rendered":"Thoughts on the process\/results so far"},"content":{"rendered":"<p>Below are some rough thoughts that I&#8217;ve hacked out on the data collection process and results so far.\u00a0 They are somewhat scattered at this point, so bear with me.\u00a0 I&#8217;ve also included thoughts on potential graphs that can be made to display the findings as of yet.<\/p>\n<p>&nbsp;<\/p>\n<p dir=\"ltr\">Based  on the preliminary searches I did on the Web of Science citations and  having completed searching and analysis for most of the accession  numbers in Google Scholar, it seems that articles which cite the dataset  rather than the data collection article are more likely to actually  reuse the data. \u00a0It appears that most papers cite the dataset directly  in the text referring to the repository name or abbreviation and then  the Unique Identifier. \u00a0For data reuse of GEO and Array Express  repositories especially, it was also common to have a table listing all  of the Unique identifiers of datasets reused in the study.<\/p>\n<p>&nbsp;<\/p>\n<p dir=\"ltr\">Data  repositories that have a more unique data identifier allow a search  with a higher recall and precision, whereas data repositories that have a  generic data identifier such as a four digit number require more search  parameters to increase precision so much that some potential hits may  be excluded. \u00a0For example, GEOROC has a 4 or 5 digit ID without an  associated letter or repository identifier. \u00a0Therefore we had restrict  the search terms to GEOROC9022 OR &#8220;GEOROC 9022&#8221; where 9022 is the GEOROC  assigned ID number, as the search for GEOROC 9022 without quotation  marks returned way too many unrelated results to sort through. \u00a0However,  this may have weeded out potential data reuses that were not found, as  no hits were found using those search terms for the GEOROC repository.  \u00a0A better search that returned more precise results was GEO; out of 165  citations collected, only 6 did not cite the dataset. \u00a0This is directly  related to unique identifiers for GEO having the three letters GSE  directly preceeding the the 4+ digit accession number without a space  between. \u00a0Repositories using a DOI for each dataset were also somewhat  easier to track, although you had to search for the doi without the  prefix \u201cdoi:\u201d, with the prefix, and with the prefix and a space as  authors do not cite DOIs consistently and Google does not tell you how  the algorithm works for retrieving articles within Google Scholar so  creating a search string is always hit an miss until you find a  combination that works.<\/p>\n<p><span style=\"text-decoration: underline\">Potential graphs so far<\/span><\/p>\n<ol>\n<li>Bar Graph\n<ol>\n<li>x-axis: Data Repositories<\/li>\n<li>y-axis:  total data hits found, with bar increments from bottom up of total  articles reused, reused as example, data cited but not used, and does  not cite data (unless remove this info as I didn\u2019t include hits if they  didn\u2019t seem relevant)<\/li>\n<\/ol>\n<\/li>\n<li>Multi-Line graph &#8211; one line per data repository\n<ol>\n<li>x-axis: number of hits\/dataset<\/li>\n<li>y-axis: number of datasets<\/li>\n<\/ol>\n<\/li>\n<li>Unknown type &#8211; based on per dataset level (within data repository)\n<ol>\n<li># of citations in WoS<\/li>\n<li># of hits from the DOI<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Below are some rough thoughts that I&#8217;ve hacked out on the data collection process and results so far.\u00a0 They are somewhat scattered at this point, so bear with me.\u00a0 I&#8217;ve also included thoughts on potential graphs that can be made to display the findings as of yet. &nbsp; Based on <a class=\"more-link\" href=\"https:\/\/notebooks.dataone.org\/data-reuse\/thoughts-on-the-processresults-so-far\/\">Continue reading <span class=\"screen-reader-text\">  Thoughts on the process\/results so far<\/span><span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":15,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[111],"tags":[],"_links":{"self":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/185"}],"collection":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/comments?post=185"}],"version-history":[{"count":1,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/185\/revisions"}],"predecessor-version":[{"id":555,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/185\/revisions\/555"}],"wp:attachment":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/media?parent=185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/categories?post=185"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/tags?post=185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}