Usage of hadoop—WordCount

Hadoop Key words:  open-source, software framework, distributed storage, big data processing, MapReduce programming model, HDFS

hadoop framework is composed of four modules:

  • Hadoop Common—libraries and utilities needed by other modules
  • Hadoop Distributed File System(HDFS)—stores data on commodity machines, high aggregate bandwidth across the cluster
  • Hadoop Yarn—a platform to managing computing resources in clusters and scheduling users’ applications
  • Hadoop MapReduce—an implementation of the MapReduce programming model

environment:  ubuntu14.0.1

download from  http://mirror-hk.koddos.net/apache/hadoop/common/       // here we use hadoop-2.9.1

unzip: tar -zxf $path/hadoop-2.9.1.tar.gz

run wordcount:

1.  create input file wc.txt    // put some words to count and separate them by a blank space

e.g. wc.txt:

the story of Shakespeare's Hamlet was derived from the legend of Amleth preserved by 13th-century 
chronicler Saxo Grammaticus in his Gesta Danorum as subsequently retold by the 16th-century scholar 
François de Belleforest Shakespeare may also have drawn on an earlier Elizabethan play known today as 
the Ur-Hamlet though some scholars believe he himself wrote the Ur-Hamlet later revising it to create 
the version of Hamlet we now have He almost certainly wrote his version of the title role for his fellow 
actor Richard Burbage the leading tragedian of Shakespeare's time In the 400 years since its inception the 
role has been performed by numerous highly acclaimed actors in each successive century

2.  cd $path/hadoop-2.9.1

3.  bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar wordcount $path/wc.txt result

// using hadoop’s own jar package to count word in wc.txt, and output the result in result folder, result folder will locate in hadoop root directory, note that result folder isn’t already exits.

4.  cat /result/part-r-00000    // Viewing result file

13th-century	1
16th-century	1
400	1
Amleth	1
Belleforest	1
Burbage	1
Danorum	1
Elizabethan	1
François	1
Gesta	1
Grammaticus	1
Hamlet	2
He	1
In	1
Richard	1
Saxo	1
Shakespeare	1
Shakespeare's	2
The	1
Ur-Hamlet	2
acclaimed	1
actor	1
actors	1
almost	1
also	1
an	1
as	2
been	1
believe	1
by	3
century	1
certainly	1
chronicler	1
create	1
de	1
derived	1
drawn	1
each	1
earlier	1
fellow	1
for	1
from	1
has	1
have	2
he	1
highly	1
himself	1
his	3
in	2
inception	1
it	1
its	1
known	1
later	1
leading	1
legend	1
may	1
now	1
numerous	1
of	5
on	1
performed	1
play	1
preserved	1
retold	1
revising	1
role	2
scholar	1
scholars	1
since	1
some	1
story	1
subsequently	1
successive	1
the	9
though	1
time	1
title	1
to	1
today	1
tragedian	1
version	2
was	1
we	1
wrote	2
years	1

Leave a Reply

Your email address will not be published. Required fields are marked *