For now, though, I've turned my attention to the part of the system reading the filesystem. The Fascinator uses harvesters to grab data from various places via various means and put the object/metadata into Fedora. For example, we have ORE and PMH harvesters to schlurp (technical term) up repository data. The current filesystem harvester basically takes a snapshot of the filesystem and load metadata into Fedora. We don't make a copy of the file in Fedora as we're expecting the files to get quite large and don't want to replicate that within the desktop.
The main goal is to pick up what the user has in their directories and give them a more expansive (metadata/tags/etc) view of it. This means that the user can continue to use the filesystem and their preferred apps. It also means that we have to keep up with the filesystem state.
The first thought was to poll the filesystem but that is rather intensive. Luckily, one our team members, Linda, has done a thesis that covers the alternatives and, with some quick research, I located some python options for Windows/Linux. I'm not certain how this works in OS X so I'll have to get one of the developers to test it. So for each platform:
- Linux: use inotify via pyinotify
- Windows: use FileSystemWatcher via IronPython
- Mac: use File System Events via Python (?)
The team laughs at my diagrams so I like to make sure I include one:
There may still be an issue with staying current with the filesystem state. Something may be lost (if the service dies for some reason) so we might still need the scanning system in case a disconnect occurs between the filesystem and the repository. This would potentially be something that the user can run when they're not finding their file in The Fascinator.