Install the following libraries and tools first.
# CMake 3.1, g++ 4.8.4 or higher
sudo apt-get update
sudo apt-get install cmake g++-4.8
# Redis (v3.2.8 or higher)
wget http://download.redis.io/releases/redis-3.2.8.tar.gz
tar -zxvf redis-3.2.8.tar.gz
cd redis-3.2.8
make && sudo make install
# Configure Redis to allow remote access
# 1. Edit /etc/redis/6379.conf: Change "bind 127.0.0.0" to "bind 0.0.0.0"
# 2. Restart Redis
sudo service redis_6379 restart
# Hiredis (C client for Redis)
wget https://github.com/redis/hiredis/archive/v1.0.2.tar.gz -O hiredis.tar.gz
tar -zxvf hiredis.tar.gz
cd hiredis
make && sudo make install
# Intel ISA-L (v2.14.0 or higher)
wget https://github.com/intel/isa-l/archive/v2.30.0.tar.gz -O isa-l-2.14.0.tar.gz
tar -zxvf isa-l-2.14.0.tar.gz
cd isa-l-2.14.0
./autogen.sh && ./configure
make && sudo make install
# gf-complete (v1.03)
wget https://github.com/ceph/gf-complete/archive/master.tar.gz -O gf-complete.tar.gz
tar -zxvf gf-complete.tar.gz
cd gf-complete
./autogen.sh && ./configure
make && sudo make install
# install grpc according to https://grpc.io/docs/languages/cpp/quickstart/
git clone --recurse-submodules -b v1.74.0 --depth 1 --shallow-submodules https://github.com/grpc/grpc
cd grpc
export MY_INSTALL_DIR=$HOME/.local
mkdir -p $MY_INSTALL_DIR
export PATH="$MY_INSTALL_DIR/bin:$PATH"
mkdir -p cmake/build
cd cmake/build
cmake -DgRPC_INSTALL=ON \
-DgRPC_BUILD_TESTS=OFF \
-DCMAKE_CXX_STANDARD=17 \
-DCMAKE_INSTALL_PREFIX=$MY_INSTALL_DIR \
../..
make -j 4
make installInstall Java 8, Maven, Hadoop, and Spark.
# Maven (v3.5.0 or higher)
# 1. Download and install apache-maven-3.5.0-bin.tar.gz.
wget https://archive.apache.org/dist/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz
tar -zxvf apache-maven-3.5.0-bin.tar.gz
# 2. Set the environment variables.
export M2_HOME=$(pwd)/apache-maven-3.5.0
export PATH=$PATH:$M2_HOME/bin
# HDFS (v3.0.0)
# 1. Download and install HDFS
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz
tar -zxvf hadoop-3.0.0.tar.gz
# 2. Set the following environment variables in your ~/.bashrc or a setup script:
export HADOOP_SRC_DIR=[path_to_hadoop]
export HADOOP_HOME=$HADOOP_SRC_DIR/hadoop-dist/target/hadoop-3.0.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar:$(hadoop classpath --glob)
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_HOME/jre/lib/amd64/server:/usr/local/lib:$LD_LIBRARY_PATH
# Spark (v2.4.0)
# 1. Download and install spark-2.4.0
wget https://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-without-hadoop.tgz -O spark-2.4.0.tgz
tar -zxvf spark-2.4.0.tgz
# 2. Set the environment variables.
export SPARK_HOME=[path_to_spark]/spark-2.4.0
export SPARK_MASTER_HOST=yarnExample architecture for the HDFS system is as follows.
| IP/hostname | HDFS | ROLE |
|---|---|---|
| master | NameNode | Coordinator |
| node01 | DataNode | Agent |
| node02 | DataNode | Agent |
| nodexx | DataNode | Agent |
We provide sample configuration files under conf for HDFS. Here, we show some of the fields related to the integration. You may configure HDFS by modifying files under $HADOOP_HOME/etc/hadoop and distribute the configuration files to all the nodes in the testbed.
- core-site.xml
| Field | Default | Description |
|---|---|---|
| fs.defaultFS | hdfs://192.168.0.1:9000 | NameNode configuration. |
| hadoop.tmp.dir | /root/hadoop-3.0.0-src/hadoop-dist/target/hadoop-3.0.0/data/ | Base directory for hdfs3 temporary directories. |
- workers
| Field | Default | Description |
|---|---|---|
| \ | 192.168.0.x | IPs of NameNodes. |
To start HDFS, we run the following commands in the NameNode.
hdfs namenode -format
start-dfs.sh
Example Architecture:
| IP/hostname | Spark | ROLE |
|---|---|---|
| master | Master | Coordinator |
| node01 | Worker | Agent |
| node02 | Worker | Agent |
| nodexx | Worker | Agent |
We provide sample configuration files under conf for Spark. You may configure Spark by modifying files under $SPARK_HOME/conf.
- slaves
| Field | Default | Description |
|---|---|---|
| \ | 192.168.x.x | IPs of Workers. |
To start Spark, we run the following commands in the Master.
start-yarn.sh
bash [your_path_to_spark]/sbin/start-all.sh
Please make sure that you have configured HDFS and Spark successfully and correctly.
cd [your_path]
mkdir build
cd build
cmake .. && makeStart foreground distributed data analytics workloads by Hibench (following the instructions at https://github.com/Intel-bigdata/HiBench). Use K-means clustering as an example.
bash [your_path_to_hibench]/bin/workloads/ml/kmeans/spark/run.sh
start monitors in agents to monitor node's network and computational resources.
bash [your_path]/ecdag/start_monitor.sh
Then you can run coordinator at master node and run agents in each worker nodes.
# in master node
[your_path]/build/ECCoordinator
# in each worker node
[your_path]/build/ECAgentOr run the script to start coordinator and agents in master and worker nodes.
bash [your_path]/script/start.shWrite file to HDFS by executing the following command in one worker node.
[your_path]/build/ECClient write [your_path_to_file] [your_path_in_hdfs] [your_poolname] [your_file_size_in_MB]
After the file is written, you can read it in one worker node by the following command.
[your_path]/build/ECClient read [your_path_in_hdfs] [your_path_to_fetch]
After the file is write, encode by the following command. Here, your_path_of_ecdag_encode_config is your file path to the ecdag config file of encode operations, see [your_path]/conf/640_n_14_k_10/ecdag_encode_640_0 for example.
[your_path]/build/ECClient encode [your_path_in_hdfs] [your_path_of_ecdag_encode_config]
After the file is encoded, run the following command in master node to trigger a repair operation with coar.
python3 run_ecdag.pyThen input the repair command parameters.
--type [repair_scheme]
--filename [your_file_path_in_hdfs]
--failed_node_id [fail_node_id]
--src_node_ids [node_ids_of_this_stripe]
--new_ids [node_ids_as_a_new_node]
--all_node_ids [all_node_ids] \
--obj_ids [object_ids_of_this_stripe]
--row_ids [row_ids_of_the_objects]
--object_size [object_size_in_byte]
--ec_info [your_path]/build/ec_info \
--output [your_path]/build/input_640MB_ecdag_temp
Parameter meanings are as follows
| field | description |
|---|---|
type |
Repair scheme, e.g., coar. |
filename |
Input file path stored in HDFS. |
failed_node_id |
The failed node id. |
new_ids |
New node id to store the repaired chunk. |
src_node_ids |
Surviving node ids. |
all_node_ids |
All nodes of this stripe. |
obj_ids |
Object ids of this stripe. They are determined when written. |
row_ids |
Row ids of the objects of this stripe. Start from 1 and increase by 1 for each object. |
object_size |
Object size in byte. |
ec_info |
File path that stores EC parameters. |
output |
File path that decoding ecdag config dumps to. Just used by ECCoordinator |
An example is as follows.
--type coar \
--filename /input_640MB \
--failed_node_id 10 \
--src_node_ids 1 2 3 4 5 6 7 8 9 11 12 13 14 \
--new_ids 15 \
--all_node_ids 1 2 3 4 5 6 7 8 9 10 11 12 13 14 \
--obj_ids 0 1 2 3 4 5 6 7 8 9 4095 4094 4093 4092 \
--row_ids 1 2 3 4 5 6 7 8 9 10 11 12 13 14 \
--object_size 67108864 \
--ec_info ../build/ec_info \
--output ../build/input_640MB_ecdag_temp
A repair operation can also be triggered in one worker node with a specified ecdag config command as follows.
[your_path]/build/ECClient decode [your_path_in_hdfs] [your_path]/build/input_640MB_ecdag_temp 0 1 2 3 4 4086 5You can stop the coordinator and all agents by the script stop.sh.
bash [your_path]/script/stop.sh