File -> New -> Java Project
Uncheck Use default location.
Use location /root/MoocHomeworks/HadoopWordCount
Click Finish
Project -> Properties -> Java Build Path
Go to the Libraries tab.
Click the Add External JARs... button.
Adding only the jars referenced in build.sh will resolve syntax errors in Eclipse.
hadoop-core-1.1.2.jar
lib/commons-cli-1.2.jar
However debugging will fail. Instead of picking around for exactly the required libraries, I recommend referencing all of the Hadoop libraries for debugging.
Use the same technique to reference all jars under hadoop-1.1.2/ and hadoop-1.1.2/lib.
You should now see the message Usage: WordCount <in> <out> when trying to debug.
Run -> Debug Configurations...
Goto the Arguments tab, Program Arguments text area.
Use value input output.
Click Apply.
You should now see an output directory and be able to set break points.
However, we need to delete this output directory between each debug session to avoid errors.
Running rm -rf output between debug sessions will accomplish this.
You can place this command in an sh file and bind it to various build actions in Eclipse.
-
Put the single
rm -rf outputcommand into aclean-output.sh -
Project -> Properties -> Builders -> New... -> Program -
Location:
/root/MoocHomeworks/HadoopWordCount/clean-output.sh -
Working Directory:
/root/MoocHomeworks/HadoopWordCount
At this point, my knowledge of Eclipse's options starts to break down. The Build Options tab contains additonal options for when to run our clean-output.sh,
but I have been unable to get a configuration that will reliably remove the output folder given simply hitting the Debug button over and over. If you have access to the IntelliJ IDE,
I've been able to use the Before Launch options to get this behavior working reliably.
As it is now, you may have to manually run it in the console between debug sessions or use Project -> Clean... in Eclipse to make sure the clean script is run before debugging.