Much of the research for this project is coming from publicly available data. There are five types of data that are being gathered:
- Workflow programs: Taverna and Kepler are the most used, but there are other workflow programs out there that have a tradition in the sciences. We hope that this study won’t be limited only to bioinformatics and ecological niche modelling, but also to other areas where computational workflows are used to implement, massage, and analyse data. Other programs used for this might include programs that are not intended to be workflows, such as R, Python, or Matlab, or programs that can be used as shims, such as Excel, or proper programs, such as VisTrails, RapidMiner, or Knime.
- Workflows: Examples of workflows themselves will be analysed for complexity. In particular, we’ll be analysing them based on data input, QA/QC steps, external models, iterative loops, recursion, subject matter. These workflows could come from myExperiment, from various depositories, from example workflows given with packages on the system sites, or from scientists themselves.
- Usage: How much are these workflows being re-used? Are people spreading them around, or does every scientist reinvent the wheel each time a workflow is created? How can we help with this process?
- Users: Is there one guy making all of these, or do scientists understand how to do this well enough to make them for their own experiments? How many workflows are reused for individual study? What are the problems with learning curves for workflow systems?
Research: What is out there already, in terms of journal articles, books, or similar digestions and analyses of workflows? What are the most cited journal articles? Who are the experts in the field? Are they all developers themselves?
So, how can you help? If you know of any workflow languages, of any research, of any power-users – drop us a line at richard [dot] littauer [at] gmail. And, even more importantly, if you know of any depositories, or if you yourself have workflows, please get in touch! We can use all of the help we can compiling a database of workflows to look at. The more we analyse, the more we’ll understand, the more we can help.