Distribution assurance
This stage has three components. Load Balancing, Moving SIPs to their primary destination and unpacking them.
Load balancing Having ingested the volume directory metadata the system is now primed to expect the SIPs of data that makeup that file system. The selection of the primary storage of the data is the first task of the loadbalancer. It allocates a storage server to hold the data held within the SIP and records this in the FCluster inodes table. Allocation is based on the available capacity of the host, its processing power and its estimated time to finish its current task list.
The movefile daemon
The move file daemon also uses “checklist” type assurance by constantly scanning the inodes table of FClusterfs
for any SIP that has been allocated a data node, not been marked as being ‘in place’ and where the evidence SIP is
staged in a local directory. If these conditions are met the SIP is transferred to the storage datanode as allocated by the load balancer. If, and only if, the transfer is successful does move data update the inode table with ‘primarystorageinplace’ set to true. Move data is the only mechanism whereby actual data can be moved around the system. It can only operate when all the preconditions from Ingestion Assurance are met. It does not simply scan an evidence folder and move whatever SIPs are present; it moves only expected SIPs, as recorded in the FCluster inodes table, from a folder.
The unpack daemon Unpacker daemon constantly scans the inodes table to see if there are any SIPs that are on their local server but not unpacked. It takes the entry from the database and looks to see if the files are on its ftp host, as should be the case from the entries in inodes, not the other way round. A file that simply arrives on the server without an entry in inodes would be ignored. When a suitable SIP is identified it is split into header and data sections. The header, containing the metadata is inserted into the ‘meta_data’ table and the header file erased. The data section is undecoded and the data decrypted with a key stored in the VolumeListing table. This was the key first created and issued by the FCluster and used to encrypt the data in the SIP at acquisition time. If the key does not work, the file cannot be decrypted and so unpacking would fail. Only if the file decrypts and the resulting file has an SHA1 checksum that matches both the name of the file itself and the SHA1 as recorded in the inodes table is the datafile finally accepted.
Processing assurance
The task daemon scans the tasks table to see if any job is required for a file that it holds locally. Because all file access must take place by utilising the enhanced FClusterfs filesystem the file must be the correct file and must have the original content that was collected at imaging time. FClusterfs also gives us fine grained access control to the files within a file system. We could, if we wished, control which users can process specific data with specific programs.