Tuesday, November 18, 2014

Hadoop Troubleshoot: Hadoop build error related to findbugs, Eclipse configuration, protobuf, and AvroRecord

Last week, I was trying to build Hadoop 2.5.0 from source code. I tried several ways to build the source code, first one is using Maven in terminal, and the second one is using Eclipse as my IDE.

1. Related to findbugs (using maven in terminal)

I read the BUILDING.txt that you can find in the root of Hadoop source code directory. And I ran this command to build me a hadoop package:

$ mvn package -Pdist,native,docs,src -DskipTests -Dtar

And some how in the middle of long building time, there is an error related to FINDBUGS_HOME environment variable as you can see in this specifice error message:

hadoop-common-project/hadoop-common/${env.FINDBUGS_HOME}/src/xsl/default.xsl

I already have the findbugs installed and then I tried to set the FINDBUGS_HOME to /usr/bin/findbugs using this command:

$ export FINDBUGS_HOME=/usr/bin/findbugs

But still no use, the error was still there. So I downloaded the findbugs source code from sourceforge and set once again the FINDBUGS_HOME to the findbugs source code root directory.

$ export FINDBUGS_HOME=/path/to/your/<sourcecode>/findbugs

I tried to run build command again and it went well this time. :)


2. Build path error (Eclipse configuration)

When you're trying to import the Hadoop projects into your Eclipse workspace, and try to build all the projects, probably you will many kind of errors, but you could see this specific error message too:

Unbound classpath variable: 'M2_REPO/asm/asm/3.2/asm-3.2.jar' in project 'hadoop-tools-dist' hadoop-tools-dist

This error related to M2_REPO Classpath variable in Eclipse. To solve this problem you can open Classpath Variable configuration  from Eclipse menu:

"Windows -> Preferences". It will open Preferences dialog. After that, in that dialog you can go to:

"Java -> Build Path -> Classpath Variable". You can add new Classpath Variable and give name to the new variable M2_REPO  and fill the path with: 

/home/<username>/.m2/repository  

Try to rebuild all you projects. You won't see those kind of error again after that. But there are still many error in your project.

3. hadoop-streaming build path error

If you are lucky (or not), you will find thiserror message related to hadoop-streaming build path error:

Project 'hadoop-streaming' is missing required source folder: '/hadoop-2.5.0-working/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf'

This error is quite strange. Because the build path can't be changed although you already edited it (I tried this many times and it was quite annoying). But the solution more strange, just remove the build path configuration and it will disappear. You can do it this way:

To open the configuration, you can right click to your hadoop-streaming project to open the context menu. And then choose "Build Path -> Configure Build Path". It will open "Properties for hadoop-streaming" dialog. From that dialog, you can choose "Source" tab and select the problematic source folder and then press the "Remove" button. After that just rebuild the projects. And the remaining error you see is related to protobuf and avro missing files.

4. Protobuf and Avro missing files

The remaining errors you might find are related to missing files in hadoop-common project. The problematic packages in the project are org.apache.hadoop.io.serializer and org.apache.hadoop.ipc.protobuf (you can see the protobuf directory is empty). I have done searching about the empty protobuf directory but it only tells us to rebuild the project, and it will automatically generate the file for you, using maven with these commands: 

$ mvn clean
$ mvn compile -Pnative
$ mvn package
$ mvn compile findbugs:findbugs
$ mvn install
$ mvn package -Pdist,docs,src,native -Dtar

I don't know if this one works for you, but it didn't work in my project. These errors messages still linger in my project:

AvroRecord cannot be resolved to a type
The import org.apache.hadoop.ipc.protobuf.TestRpcServiceProtos cannot be resolved
The import org.apache.hadoop.ipc.protobuf.TestProtos cannot be resolved
TestProtobufRpcProto cannot be resolved to a type
TestProtobufRpc2Proto cannot be resolved to a type
EmptyRequestProto cannot be resolved to a type
EchoResponseProto cannot be resolved to a type

EchoRequestProto cannot be resolved to a type

After I found this very good site called grepcode, and I was able to find all classes and files that I needed (you even can get from other version of hadoop!).
For example you can download the AvroRecord.java file here, and you can put it in the directory respectively.


That's all. If you find this post is useful, please leave a comment below. See you in my next post.

Thursday, November 13, 2014

MLj Tips: MLj set up and installation

After some of trials and errors I managed to run my MLj in my machine. A lot of mistakes I made because I can't find any good tutorial out there. So, I write one for myself. You can read about MLj here.

Here are some steps that I did when setting up MLj in my machine:

1. Get the linux MLj

http://www.dcs.ed.ac.uk/home/mlj/dist/downloads/mlj0.2c-linux.tar.gz

2. Install the SMLNJ

You can install the SMLNJ using this command:
$ sudo apt-get install smlnj

3. Extract mlj0.2c-linux.tar.gz

You can extract your mlj package somewhere in your drive. It will contain mlj as the root directory.

4. Go to your mlj/bin directory

Open and go to your mlj/bin directory. You'll find 4 files there. They are mlj, mlj.bat, mlj-jdk1.1.1.x86-linux, and run.x86-linux.

5. Try to run the mlj

Use this command to run the mlj:
$ ./mlj    

Probably it will raise this kind of error:
./mlj: 1: ./mlj: .arch-n-opsys: not found
 mlj: unable to determine architecture/operating system

to fix this, you can set your PATH environment variable using this command:
$ export PATH=$PATH:<your mlj/bin path>

or

you can open and edit your mlj file and change this part of code:
ARCH_N_OPSYS=`.arch-n-opsys`
to    
ARCH_N_OPSYS=`./.arch-n-opsys`

Try run ./mlj or mlj once again. and the error is still there. Because the .arch-n-opsys file is not compatible with current desktop architecture environment. It's okay, we will get rid of it in the next step. 

mlj: unable to determine architecture/operating system

6. Copy .arch-n-opsys file from your smlnj

You may create backup of your current .arch-n-opsys file if you want and then copy the new one from the smlnj directory. Here are the commands:
$ mv .arch-n-opsys .arch-n-opsys.bak
$ cp /usr/lib/smlnj/bin/.arch-n-opsys .

7. Try to run mlj once again

Try to run ./mlj or mlj again and you'll get this view:

MLj 0.2c on x86 under Linux with basis library for JDK 1.1
Copyright (C) 1999 Persimmon IT Inc.

MLj comes with ABSOLUTELY NO WARRANTY. It is free software, and you are
welcome to redistribute it under certain conditions.
See COPYING for details.

Your installation is done!

If you have any question or find my post useful please leave a comment. 

Tuesday, November 11, 2014

Dictionary: First Class Methods/Functions

In Programming language first class methods/functions means the programming language treats the method or function as a first class citizen. 

Based on Structure and Interpretation of Computer Programs 2nd Edition Book, elements with fewest restrictions are said to have first-class status. The rights and privileges  of first-class element are:
  • They may be named by variables.
  • They may be passed as arguments to procedures.
  • They may be returned as the results of procedures.
  • They may be included in data structures.
So if a language support a first-class function, it will let the functions to be passed as a parameters, and returned as a result of procedures. First class function is necessary for the functional programming style.

Example of programming languages that support first-class function are Scheme, ML, Haskel, F#, Perl, Scala, Python, PHP, Lua, JavaScript, C#, C++, and etc.

Sources:
  1. http://en.wikipedia.org/wiki/First-class_function
  2. Abelson, Harold; Sussman, Gerald Jay (1996). Structure and Interpretation of Computer Programs - 2nd Edition. MIT Press.


Thursday, November 6, 2014

WALA Tips: Problems/errors when you are trying to run WALA examples

Today I tried to run examples of WALA, and I found and tackled some problems that you might find also. I made this post to help me remember how I solve those problem. :).

For a complete configuration manual you can refer to this WALA wiki. Here are the problem list:

1. The import org.eclipse.pde.internal.core.PDEStateHelper cannot be resolved

I use the latest Eclipse version (Luna) when this post written. This problem happens when I tried to build the com.ibm.wala.ide project. There is EclipseProjectPath.java that you can find in com.ibm.wala.ide.util package, needs to import the org.eclipse.pde.internal.core.PDEStateHelper.

So to resolve this problem I downloaded the org.eclipse.pde.core_3.3.jar that contain  the org.eclipse.pde.internal.core.PDEStateHelper class. And add that file to my java build path libraries.  You can do that by right click you project, and then from the context menu you choose: 
"Build Path -> Configure Build Path...". It will open the project Java Build Path configuration dialog. After that you can open "Libraries" tab and then click "Add External JARs", it will open file browser and select you org.eclipse.pde.core_3.3.jar. Close your Java Build Path configuration dialog by clicking on "OK" button. Then try to build your project once again. Viola! I hope your error will vanish just like what I did. 


2. Problem when trying to run example 1 (the SWTTypeHierarchy) 

When you are trying to run example 1, SWTTypeHierarchy that you can find in com.ibm.wala.core.tests, probably you could get this error:

"{resource_loc:/com.ibm.wala.core.testdata/JLex.jar} "

or 

"com.ibm.wala.util.debug.UnimplementedError: java.io.FileNotFoundException"

This problem happen because the SWTTypeHierarchy expecting the JLex.jar to be found at com.ibm.wala.core.testdata directory. So what you need to do is put your JLex.jar file in the root of com.ibm.wala.core.testdata project root directory. After doing this step you will able to run the example 1 correctly. If successful, you should see a new window pop up with a tree view of the class hierarchy of JLex.

3. Problem when trying to run example 2 (the PDFTypeHierarchy)

When you are trying to run example 2, probably you're going to have several problems. First problem is the wala.properties configuration file. This configuration file contain path configuration of your java runtime directory "java_runtime_dir" and "output" directory. If you haven't yet created the wala.properties file you'll get this message:

"com.ibm.wala.util.WalaException: Unable to set up wala properties "

To solve that you need to create the wala.properties file in com.ibm.wala.core project dat directory. You can copy or rename the wala.properties.sample (that exist in the dat directory) into wala.properties. After that try to run the example once more and you'll get another error (sorry guys).

"java.io.IOException: property_file_unreadable wala.examples.properties"

To solve that you need to configure another file, the wala.examples.properties. This file contain executable path configuration of your pdf viewer and graphviz (you can install in Linux by using apt-get install graphviz). You need to create wala.example.properties in com.ibm.wala.core.tests project and put them in the dat directory. You can copy or rename the wala.example.properties.sample (that exisst in the dat directory) into wala.examples.properties. And then try to run once more your project. If there is another error please stay put with me. :). 

If you get these exceptions:

1. Exception in thread "main" com.ibm.wala.util.debug.UnimplementedError

Probably you didn't set the java runtime directory correctly in your wala.properties file. You must direct your path to the java jre library path. In case of my environment it should be like this: 

"/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/"

2. spawning process [null, -Tpdf, -o, com.ibm.wala.core/results/cg.pdf, -v, com.ibm.wala.core.tests/temp.dt]Exception in thread "main" 
java.lang.NullPointerException

Probably you haven't set your an output directory in your wala.properties file. Make sure to create the directory because WALA won't create it for you.

Please make sure you're checking your wala.properties and wala.examples.properties configuration files first. Make sure all properties are set correctly such as the java, output, dot_exe, and pdfview_exe paths. If all correctly configured you'll see the a PDF file representing the type hierarchy. 

Hope this post will help you. If you find another problem or you find this post help your problem, please leave a comment below. See you in my next post.


Wednesday, November 5, 2014

Eclipse Tips: Clean and Rebuild Eclipse Projects

Hi Guys, I'm pretty new in project development using Eclipse as my IDE. I've been using MS Visual Studio (VS) for 5 years and now I'm switching to Eclipse for my next project. 

When I imported an existing project into Eclipse for the first time, I wonder how to build the project. Because I can't find button in the tool bar or a menu in the context menu of the project to build the project. But when I checked the build directory, the build output already there. It turns out, eclipse will automatically build the project for you. But if you want to manually build your project you can find in the Eclipse top menu:

Project -> Uncheck the "Build Automatically" menu.

And then from there you can find other menu will be enabled. Use "Project -> Clean..." to clean your project, use "Project -> Build Project" to build your project, and "Project -> Build All" to build all your projects at once.

Thanks for reading everyone. if you find my post useful, please leave a comment below. :)

Tuesday, November 4, 2014

Hadoop Tips: File system manipulation / modification commands

If you read my previous post about Hadoop useful URL, I promise you to write about file system manipulation commands like add new file/directory, renaming file/directory, or deleting file/directory. You can do easily if you familiar with Linux command.

In Hadoop 2.5.0, all filesystem manipulation command can be done using the 'hdfs' files that you can find in hadoop bin/ directory. The usage pattern of that command is as below:

$ hdfs dfs -<command>

Here are the commands that you need to know:

1. ls

"ls" command let you to show the content of your current directory. you can add -R option to show the content all of your directories recursively.

$ hdfs dfs -ls [-R] [-h] [-d]
$ hdfs dfs -ls -R

2. put

"put" command can be use to put or upload your local file/directory into HDFS. If you not specify the file, it will put all your directory content to the HDFS destination directory. Here's the example to use it:

$ hdfs dfs -put <localpath> <hdfs path>
$ hdfs dfs -put local-file.txt destination-file.txt

3. mkdir

You create a directory in your HDFS by using mkdir command.

$ hdfs dfs -mkdir <destinationpath>/<directory name>
$ hdfs dfs -mkdir /user/username/new-directory
$ hdfs dfs -mkdir new-directory

4. mv

Just like in Linux command, you can use "mv" command to move file or directory  from one location to another location. Or you can also rename file or directory using this command. Here are the examples:

$ hdfs dfs -mv <hdfs old location> <hdfs new location>
$ hdfs dfs -mv /user/username/something.txt /user/username/otherdirectory/
$ hdfs dfs -mv /user/username/onedirectory /user/username/otherdirectory/
$ hdfs dfs -mv <hdfs old name> <hdfs new name>
$ hdfs dfs -mv /user/username/something.txt /user/username/newthing.txt
$ hdfs dfs -mv /user/username/olddirectory /user/username/newdirectory

5. rm

To delete files or directories you can use "rm" command. You can add [-R] option to do the delete recursively into the directory.

$ hdfs dfs -rm [-R] <file/directory to be deleted>
$ hdfs dfs -rm somefile.txt
$ hdfs dfs -rm -R directory

I think that 5 commands will give you "power" to manipulate the HDFS files/directories :)

If you want more complete list, you can refer to this documentation. There will be "cat", "touchz", "cp", and many other command.

If you find my post useful, please leave a comment below. Thanks for reading.


Hadoop Tips: Useful url in Hadoop system

For this several weeks I have installed and played with hadoop system. And a lot of thing I need to learn about it. So, I want to make this post so I don't forget what I have learn so far. For installation tutorial you can follow this (hadoop 2.5.0) good tutorial.

There are some url that is useful for administrating Hadoop 2.5.0 after you run the system using start-all.sh located in sbin directory. I want to write down the list down below:

1. NameNode (NN) Web UI: localhost:50070
There are several tabs in this website. In NN UI Overview tab you can see the NN status, how much storage do you have in total, used space, free space, and other statistics about your system. In the Datanode tab  you can find information about all of your functioning datanodes and decomissioned datanode. The Snapshot tab contains information about your created Snapshot. You can see your startup progress in Startup tab. The last tab, Utilities, is also very useful, you can find links to the file system browser and the log browser in that tab.

You can access your hadoop file system (HDFS) browser from http://localhost:50070/explorer.html. You can see your created directories structure from here, but you can't do things like deleting, renaming, or modifying your file system, it only let you to see your directories and files. If you want to edit your directories or files, you can read my other post later (I will write it for you :D). The last link is about log explorer that you can find in http://localhost:50070/logs/. You can find all logs created by datanode, namenode, secondary namenode, resource manager, etc.

2. ResourceManager Web UI: localhost:8088
In this Resource manager you can see a lot information about you cluster, nodes, applications, scheduler, and many more.

--------------------------------------
I haven't explore all of Hadoop feature, but I hope you can find this post useful. Please leave a comment if there is any question or you find my post useful. Cheers!