distributed-fs/README.md

3.7 KiB

Distributed File System in Rust for CCOM4017

This suite of programs handles file copying over TCP with a client/server model. It contains the following programs;

  • copy
  • ls
  • data_node
  • meta_data

copy and ls are clients that connect to the servers. copy sends file read and write requests to the meta_data server, which uses a sqlite3 database to keep track of which nodes are connected, as well as which files have been added. When a file is added, meta_data sends the list of available data_node servers, copy then divides the file up by the amount of nodes, then proceeds to transfer each chunk over 256 bytes at a time. ls simply prints out a list of the existing files on the meta_data server.

The code uses serde_json to serialize and deserialize Rust structs to and from json. The clients and servers then listen for incoming streams of data and parses them as json. As well as exchanging metadata, this protocol also establishes the handshake to then transfer the raw file chunks.

rusqlite is used for managing the sqlite database. This allows SQL queries to be performed from the rust code and manage the data base in a relatively type safe way. Unit tests in the meta_data provide coverage of these SQL operations against an in-memory version

WARNING:

If you're my professor, please do not generate a database with the default createdb.py provided in the skeleton dfs. I have included a custom version of the file in the root of the project. The reason being that I changed chunks to be integers rather than strings, in order to provide ordering to the chunks when transferring.

Running

To run the ls provide an endpoint in the ip:port format. ip can be "localhost", consider using ./ to avoid a naming conflict with the GNU version of ls

$ ./ls 127.0.0.1:6770

The meta_data server takes an optional port, but will default to 8000 if none is specified.

$ meta_data 6710

The data node takes two endpoints in the ip:port and then a an optional path. The first endpoint is the ip and port, both for binding to a TCP port and also to send itself to the meta_data server. The second endpoint is the meta_data server's ip and port. The optional base path will default to the working directory if none is provided.

$ data_node localhost:6771 127.0.0.1:8000 my_cool_data_node

The copy takes two different parameter versions, depending on whether it's sending to or receiving from the server. To send a file, provide the path to the local file, then the endpoint with the file in the ip:host:filepath format. The data_node will save the file relative to the base path provided to it.

$ copy some_path/pug.jpg localhost:6700:another_path/pug.jpg

To receive a file, simply invert the parameters

$ copy localhost:6700:another_path/pug.jpg some_path/pug.jpg

Misc Scripts

shutdown_node sends a json request with a provided port to shutdown a data_node. This ensures that the node can terminate gracefully and unregister itself from the meta_data server. I was advised against using Unix Signals, so opted for this instead.

$ shutdown_node 6770

sm just does a send message to a provide port. It can be used to test and inspect jsons. It can for instance be used to mimic the ls;

$ sm '{"p_type":"ListFiles","json":null}' 8000
Connection to localhost 8000 port [tcp/*] succeeded!
{"paths":["pug.jpg 21633 bytes"]}%

clean_db just recreates the dfs.db with the custom python script.

Building

If you wish to compile the code, install rust and cargo link

Then just run build

cargo build

If you wish to run a specific algorithm;

cargo run --bin copy

Testing

cargo test --bin meta_data