Based on my BinaryPig Single-Node cluster tutorial, I’ll now explain how to set up a Multi-Node cluster. It’s necessary that your Single-Node cluster is running properly. I changed the username to “hadoop” to avoid confusions but you can keep the “master” username.
For the Multi-Node setup, you will just need to clone your Single-Node VM and change a couple of configuration files.
Change /etc/hosts on master and slave:
Make sure that you’re able to ssh (without password) from master to slave and vice versa.
Change /usr/local/hadoop-1.2.1/conf/core-site.xml on the master and the slave:
Change /usr/local/hadoop-1.2.1/conf/masters and /usr/local/hadoop-1.2.1/conf/slaves on the master as well as on the slave:
That’s pretty much it. Just format the namenode one more time and everything should be fine:
hadoop namenode -format
Start hadoop with start-all.sh. After starting you should be able to run pig on the master or the slave:
The run_examples.sh script from binarypig should look like as follows:
# prepare to run a script
hadoop fs -rmr /tmp/scripts || true
hadoop fs -put /usr/local/binarypig/scripts /tmp/
/usr/local/binarypig/binarypig/bin/dir_to_sequencefile /tmp/data test-files
# run some jobs
pig -f examples/strings.pig -p INPUT=test-files -p OUTPUT=test-files-strings
hadoop fs -ls test-files-strings
hadoop fs -text /user/hadoop/test-files-strings/part-m-00000
The job “strings.pig” will now be executed one the slave which is basically also a single node setup but you can easily add more slaves if you want to.