Monday, July 20, 2015

Pydoop Egg install on CentOS by Cloudera Hadoop Client

The Pydoop is a quick library for us to develop or research the Hadoop service and we have some prototype require using python to access the HDFS. Therefore Pydoop is the first choice to let script interact with HDFS files. However, the Pydoop installation require some hadoop library to make it compile-able on your dev machine. Hence, we choose the Cloudera hadoop client library to install on our dev machine for developing Pydoop script by eclipse.
First, you need to install the CDH4 Repository RPM to let your CentOS get the hadoop client software package. Then, you can yum the hadoop-client
# rpm -ivh cdh4-repository-1-0.noarch.rpm
# yum install hadoop-client

For pip installation, you need to assign the JAVA_HOME and HADOOP_HOME for python to compile the pydoop package.
# export JAVA_HOME=/urs/lib/jvm/java-1.6.0
# export HADOOP_HOME=/usr/lib/hadoop
# pip install pydoop==0.10.0

No comments:

Post a Comment