Mounting the FCluster file-system
The behaviour of an FClusterfs file system is defined when it is mounted by a command line which contains the
following entries:
fclusterfs
–mysql_user¼me
–mysql_password¼mypassword
–mysql_host¼25.63.133.244
–mysql_database¼fclusterfs
–volume¼74a8f0f627cc0dc6
–audituser¼’Investigator Name’
/home/user/Desktop/fsmount
Multiple file systems can be mounted on the user’s host system and multiple SQL servers can provide storage for
FClusterfs file-system databases.
Functional overview – dataflow Having established the component parts of FCluster we can now demonstrate its operation by following data as it is gathered and passed into the system. The initial imaging process has three deliverables:
1 a SIP containing directory metadata.
2 a collection of SIPs, one each for each file that falls into
a ‘high value’ criteria set by the image acquirer.
3 a conventional ‘forensic image’, for reference and later extraction of further data.
The selection of files to be packaged as SIPs takes a prioritised triage approach collecting only file types expected to have a higher likelihood of containing evidence depending on the case type. The first stage of ingestion into FCluster is when the SIP, containing the data defining the file system directory, is imported into the MySQL database at the heart of FClusterfs. At this stage a directory skeleton will exist but no data is available within FCluster. The file data, in the form of a number of SIPs, is imported as it becomes available. This starts a process of ‘filling out’ the evidence file system with data associated with each directory entry. The data is distributed across the Datanodes according to a load balancing algorithm which bases its allocation on benchmarking previously created by running a known set of approved programs against typical data files. When a SIP arrives on its storage host, it is unpacked and its contents are verified in a number of ways. Only if it is proven to be valid is it then accepted and made available via the distributed file system, FClusterfs. Upon approval at its storage location, a defined list of tasks is invoked and automatic process is conducted, for example generating text indexing or thumb-nailing images. To provide redundancy and secondary load balancing, a replication agent firstly ensures constant and routine validation of data by applying an SHA 1 checksum to each file; it can then ensure that there are multiple copies of the data, normally three, held on separate hosts within the cluster. The SIPs at image time will have, most likely, captured only part of the evidence. Subsequently a ‘Bag it on demand’ system can trigger an on-the-fly acquisition of data that was initially deemed of secondary interest within the image once it has been completed and is available to the cluster. This data is validated and placed in the same assured manner as the rest of the system. How FCluster is configured as a network system is up to the administrator but it can form a local or wide area network. The prototype successfully uses a VPN to connect the nodes.We’ve extended it to use nodes on AmazonWeb Services. Whenever data is transferred between nodes is it always in an encrypted form and so can be considered safe in a technical sense but this may not be acceptable on principle within a legal environment. The primary objective, and the core of any speed improvement, is that processing takes place locally on the datanode holding the data. In a similar way to the use of SHA1s to identify ‘Bad’ files, the system can be used without the actual files being accessed. Results are transferred across the network but not normally the data.