1.24.2016

Capstone Cloud Computing References

1. Hadoop cluster setup on AWS - http://insightdataengineering.com/blog/hadoopdevops/
2. Cassandra Cluster setup on AWS - http://ealfonso.com/setting-up-a-cassandra-cluster-on-awsubuntu14-04/
3. Python with Hadoop Streaming
- http://www.glennklockwood.com/data-intensive/hadoop/streaming.html
- http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
4. Python with Cassandra - https://academy.datastax.com/demos/getting-started-apache-cassandra-and-python-part-i 5. Cassandra Tutorial - http://wiki.apache.org/cassandra/GettingStarted
namenode:~$ cd $HADOOP_HOME
namenode:/usr/local/hadoop$ bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -file /home/ubuntu/mapper.py -mapper /home/ubuntu/mapper.py -file /home/ubuntu/reducer.py -reducer /home/ubuntu/reducer.py -input /user/mobydick.txt -output /user/gutenberg-output
namenode:/usr/local/hadoop$ bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -D mapreduce.job.maps=4 -D mapreduce.job.reduces=1 -file /home/ubuntu/mapper.py    -mapper /home/ubuntu/mapper.py -file /home/ubuntu/reducer.py   -reducer /home/ubuntu/reducer.py -input /user/mobydick.txt -output /user/gutenberg-output

$ hdfs dfs -ls /user/gutenberg-output/
Found 5 items
-rw-r--r--   3 ubuntu supergroup          0 2016-01-26 07:53 /user/gutenberg-output/_SUCCESS
-rw-r--r--   3 ubuntu supergroup      91541 2016-01-26 07:53 /user/gutenberg-output/part-00000
-rw-r--r--   3 ubuntu supergroup      91157 2016-01-26 07:53 /user/gutenberg-output/part-00001
-rw-r--r--   3 ubuntu supergroup      91940 2016-01-26 07:53 /user/gutenberg-output/part-00002
-rw-r--r--   3 ubuntu supergroup      92025 2016-01-26 07:53 /user/gutenberg-output/part-00003

Delete output before starting a new job:
$ hdfs dfs -rm /user/gutenberg-output/*
$ hdfs dfs -rmdir /user/gutenberg-output/