As we’re approaching the end of this summer project, it’s time to think about the end game. We’re targeting to study the provenance of notebooks and we choose iPython notebook as a use case and we studied NoWorkflow system in detail, so it’s natural to combine these two together. Since iPython allows users to write extensions as user-define functions or libraries, therefore, an iPython extension for capturing provenance in notebook would be a good idea.
The iPython noteobook is much like a debugger with breakpoints between two adjacent cells. Of course, notebook is more for viewing purpose, so each cell can be moved around and executed separately; debugger is more for debugging purpose that the information captured when reaching breakpoints is more informative and complete. So it’s kind of nice if we can combine the good features from those two tools together, or in our case, add the debugger’s feature to notebook while maintaining the flexibility of cells.
NoWorkflow system executes the target Python script and capture the profiling information during execution, in other word, it’s a wrapper around the target Python that can execute the script and process the information from the execution. So the key point here is that we need to use a script to capture the information and this script is invoked when the target script is executed. For iPython notebook, the idea would be the following: for each script, start with a %load_extension magic at the beginning of the script to enable provenance capturing, and at the end of the script, several functions like “show provenance”, “list trial” and so on can be called to view the information captured.
Since we have NoWorkflow system in hand and the new tool we abandoned can be a good reference for getting profiling information from iPython notebook. So the idea here is to minimize the effort by combining the two existing components together rather than writing something completely new. I’ve started prototyping our provenance extension, next week, we should have something to show.