Harvesting @DataONEorg Twitter Mentions via Topsy

The previous notebook entry concerned mentions of @DataONEorg on Twitter.

I established the following:

The oldest tweet is from 2 years ago.

It is dated July 29, 2012.

This tweet is accessible from here:

The very first re-tweet of @DataONEorg was March 15, 2011.

This was 5 months after @DataONEorg joined Twitter (November 18, 2010).

The tweet is accessible via Topsy from this link:

This is valid for the time period November 1, 2010 to November 1, 2011.

I need the missing period between November 1, 2011 and July 29, 2012.

I must generate a new search limited to that time period on Topsy.

The link for the time period November 1, 2011 to July 29, 2012 is:

I now have three links for two time periods:

It is now possible to estimate number of tweets, based on 10 tweets per page:

Now I need to create a spreadsheet with unique URLs for each page of 10 tweets each, counting down from the maximum tweet for each time period.

For example:

And so on.

There are a total of 15,500 tweets.

There would be 1,500 rows of unique URLs encompassing each of the four time periods.

I have three possibilities in mind for extracting this data.

1. Try the linky Firefox add on to collect 10 items 1,500 times (probably impractical)

I can view 100 pages at a time. Would only take 15 iterations of that. Worth looking at.

2. Try Xenu link checking software to harvest links as if doing a link check.

3. Some other URL scraping tool.

I will investigate this further on a PC, as Xenu works on a PC.

This PHP example did not work:

http://www.web-max.ca/PHP/misc_23.php

This may be worth looking at:

I’ll need to test these possibilities.

2 Replies to “Harvesting @DataONEorg Twitter Mentions via Topsy”

About Tanner Jessel