I have put together a list of literature reviews the tools that I am using. Please see the attached:
https://www.dropbox.com/s/mwk6xtdaga50pod/Literature%20Review%20Round%202.docx
I am doing the experiment on the topic model based approach. The experiment takes relatively longer time than the TF-IDF — Training 300 topics with 1000 iterations could take a day using the whole corpus from the 4 archives (DAAC, Dryad, KNB, and Treebase). I will post further findings during the weekends here as well.
Update: The experiment on the Topic Model (TM) based approach is done. Please see the slides below for the results and discussion.
https://www.dropbox.com/s/5tws3yd5rgix3jx/Week07.pptx