2. Cassandra Cluster setup on AWS - http://ealfonso.com/setting-up-a-cassandra-cluster-on-awsubuntu14-04/
3. Python with Hadoop Streaming
- http://www.glennklockwood.com/data-intensive/hadoop/streaming.html
- http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
4. Python with Cassandra - https://academy.datastax.com/demos/getting-started-apache-cassandra-and-python-part-i 5. Cassandra Tutorial - http://wiki.apache.org/cassandra/GettingStarted
namenode:~$ cd $HADOOP_HOME namenode:/usr/local/hadoop$ bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -file /home/ubuntu/mapper.py -mapper /home/ubuntu/mapper.py -file /home/ubuntu/reducer.py -reducer /home/ubuntu/reducer.py -input /user/mobydick.txt -output /user/gutenberg-output namenode:/usr/local/hadoop$ bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -D mapreduce.job.maps=4 -D mapreduce.job.reduces=1 -file /home/ubuntu/mapper.py -mapper /home/ubuntu/mapper.py -file /home/ubuntu/reducer.py -reducer /home/ubuntu/reducer.py -input /user/mobydick.txt -output /user/gutenberg-output $ hdfs dfs -ls /user/gutenberg-output/ Found 5 items -rw-r--r-- 3 ubuntu supergroup 0 2016-01-26 07:53 /user/gutenberg-output/_SUCCESS -rw-r--r-- 3 ubuntu supergroup 91541 2016-01-26 07:53 /user/gutenberg-output/part-00000 -rw-r--r-- 3 ubuntu supergroup 91157 2016-01-26 07:53 /user/gutenberg-output/part-00001 -rw-r--r-- 3 ubuntu supergroup 91940 2016-01-26 07:53 /user/gutenberg-output/part-00002 -rw-r--r-- 3 ubuntu supergroup 92025 2016-01-26 07:53 /user/gutenberg-output/part-00003 Delete output before starting a new job: $ hdfs dfs -rm /user/gutenberg-output/* $ hdfs dfs -rmdir /user/gutenberg-output/