Skip to content

Instantly share code, notes, and snippets.

@davidmdem
Last active January 28, 2017 17:46
Show Gist options
  • Select an option

  • Save davidmdem/6b727d51d3201fba466a2600146525cb to your computer and use it in GitHub Desktop.

Select an option

Save davidmdem/6b727d51d3201fba466a2600146525cb to your computer and use it in GitHub Desktop.
Debugging HadoopWordCount in Eclipse

Debugging HadoopWordCount in Eclipse

Create a new project

File -> New -> Java Project

Uncheck Use default location.

Use location /root/MoocHomeworks/HadoopWordCount

Click Finish

Add Hadoop library references

Project -> Properties -> Java Build Path

Go to the Libraries tab.

Click the Add External JARs... button.

Adding only the jars referenced in build.sh will resolve syntax errors in Eclipse.

hadoop-core-1.1.2.jar lib/commons-cli-1.2.jar

However debugging will fail. Instead of picking around for exactly the required libraries, I recommend referencing all of the Hadoop libraries for debugging.

Use the same technique to reference all jars under hadoop-1.1.2/ and hadoop-1.1.2/lib.

Configure usage

You should now see the message Usage: WordCount <in> <out> when trying to debug.

Run -> Debug Configurations...

Goto the Arguments tab, Program Arguments text area.

Use value input output.

Click Apply.

Clean between builds

You should now see an output directory and be able to set break points.

However, we need to delete this output directory between each debug session to avoid errors.

Running rm -rf output between debug sessions will accomplish this.

You can place this command in an sh file and bind it to various build actions in Eclipse.

  • Put the single rm -rf output command into a clean-output.sh

  • Project -> Properties -> Builders -> New... -> Program

  • Location: /root/MoocHomeworks/HadoopWordCount/clean-output.sh

  • Working Directory: /root/MoocHomeworks/HadoopWordCount

At this point, my knowledge of Eclipse's options starts to break down. The Build Options tab contains additonal options for when to run our clean-output.sh, but I have been unable to get a configuration that will reliably remove the output folder given simply hitting the Debug button over and over. If you have access to the IntelliJ IDE, I've been able to use the Before Launch options to get this behavior working reliably.

As it is now, you may have to manually run it in the console between debug sessions or use Project -> Clean... in Eclipse to make sure the clean script is run before debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment