quarta-feira, 14 de março de 2012

Hadoop + Amazon EC2 - An updated tutorial

There is an old tutorial placed at Hadoop's wiki page: http://wiki.apache.org/hadoop/AmazonEC2, but recently I had to follow this tutorial and I noticed that it doesn't cover some new Amazon functionality.

To follow this tutorial is recommended that you are already familiar with the basics of Hadoop, a very useful "how to start" tutorial can be found at Hadoop's homepage: http://hadoop.apache.org/. Also, you have to be familiar with at least Amazon EC2 internals and instance definitions.

When you register an account at Amazon AWS you receive 750 hours to run
t1.micro instances, but unfortunately, you can't successfully run Hadoop in such machines.

On the following steps, when a command starts with $ means that it should be executed into the local machine, and with # into the EC2 instance.

Create an X.509 Certificate


Since we gonna use ec2-tools, our account at AWS needs a valid X.509 Certificate:
  • Create .ec2 folder:
  • $ mkdir ~/.ec2
  • Login in at AWS
    • Select “Security Credentials” and at "Access Credentials" click on "X.509 Certificates";
    • You have two options:
      • Create certificate using command line:
      • $ cd ~/.ec2; openssl genrsa -des3 -out my-pk.pem 2048
        $ openssl rsa -in my-pk.pem -out my-pk-unencrypt.pem
        $ openssl req -new -x509 -key my-pk.pem -out my-cert.pem -days 1095
        • It only works if your machine date is ok.
      • Create the certificate using the site and download the private-key (remember to put it at ~/.ec2).

Setting up Amazon EC2-Tools

  • Download and unpack ec2-tools;
  • Edit your ~/.profile to export all variables needed by ec2-tools, so you don't have to do it every time that you open a prompt:
    • Here is an example of what should be appended to the ~/.profile file:
      • export JAVA_HOME=/usr/lib/jvm/java-6-sun
      • export EC2_HOME=~/ec2-api-tools-*
      • export PATH=$PATH:$EC2_HOME/bin
      • export EC2_CERT=~/.ec2/my-cert.pem
    • To access an instance, you need to be authenticated (obvious security reasons), in this way, you have to create a Key Pair (public and private keys):
      • At https://console.aws.amazon.com/ec2/home, click on "Key Pairs", or
      • You can run the following commands:
      • $ ec2-add-keypair my-keypair | grep –v KEYPAIR > ~/.ec2/id_rsa-keypair
        $ chmod 600 ~/.ec2/id_rsa-keypair

Setting up Hadoop


After download and unpack Hadoop, you have to edit the EC2 configuration script present at src/contrib/ec2/bin/hadoop-ec2-env.sh.
  • AWS variables
    • These variables are related to your AWS account (AWS_ACCOUNT_ID, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), they can be found at logging at your account, in Security Credentials;
      • The AWS_ACCOUNT_ID is your 12 digit account number.
  • Security variables
    • The security variables (EC2_KEYDIR, KEY_NAME, PRIVATE_KEY_PATH), are the ones related to the launch and access of an EC2 instance;
    • You have to save the private key into your EC2_KEYDIR path.
  • Select an AMI
    • Depending on Hadoop's version that you want to run (HADOOP_VERSION) and the instance type (INSTANCE_TYPE), you should use a properly image to deploy your instance:
    • There are many public AMI images that you can use (they must suit the needs for most users), to list, type
    • $ ec2-describe-images -x all | grep hadoop
    • Or you can build your own image, and upload it to an Amazon S3 bucket;
    • After selecting the AMI you will use, there are basically three variables to edit at hadoop-ec2-env.sh:
      • S3_BUCKET: the bucket where is placed the image that you will use, example hadoop-images,
      • ARCH: the architecture of the AMI image you have chosen (i386 or x84_64) and
      • BASE_AMI_IMAGE: the unique code that maps an AMI image, example ami-2b5fba42.
    • Other configurable variable is the JAVA_VERSION, there you can define which version will be installed along with the instance:
      • You can also provide a link where would be located the binary (JAVA_BINARY_URL), for instance, if you have JAVA_VERSION=1.6.0_29, an option is use JAVA_BINARY_URL=http://download.oracle.com/otn-pub/java/jdk/6u29-b11/jdk-6u29-linux-i586.bin.

Running!

  • You can add the content of src/contrib/ec2/bin to your PATH variable so you will be able to run the commands indepentend from where the prompt is open;
  • To launch a EC2 cluster and start Hadoop, you use the following command. The arguments are the cluster name (hadoop-test) and the number of slaves (2). When the cluster boots, the public DNS name will be printed to the console.
  • $ hadoop-ec2 launch-cluster hadoop-test 2
  • To login at the master node from your "cluster" you type:
  • $ hadoop-ec2 login hadoop-test
  • Once you are logged into the master node you will be able to start the job:
    • For example, to test your cluster, you can run a pi calculation that is already provided by the hadoop*-examples.jar:
    • # cd /usr/local/hadoop-*
      # bin/hadoop jar hadoop-*-examples.jar pi 10 10000000
  • You can check your job progress at http://MASTER_HOST:50030/. Where MASTER_HOST is the host name returned after the cluster started.
  • After your job has finished, the cluster remains alive. To shutdown you use the following command:
  • $ hadoop-ec2 terminate-cluster hadoop-test
  • Remember that in Amazon EC2, the instances are charged per hour, so if you only wanted to do tests, you can play with the cluster for some more minutes.

45 comentários:

  1. You have certainly explained that Big data analytics is the process of examining big data to uncover hidden patterns, unknown correlations and other useful information that can be used to make better decisions..The big data analytics is the major part to be understood regarding Big Data Course in Chennai program. Via your quality content i get to know about that in deep. Thanks for sharing this here.

    ResponderExcluir
  2. Thankyou for the information provided.
    I am looking forward for build an web application where i can use amazon api to bring the data from amazon ec2 cloud for bigdata and perform certain analytics and send the data back to amazon cloud.
    Can you help me on the high level of performing the above tasks.

    ResponderExcluir
  3. Nice blog..! I really loved reading through this article. Thanks for sharing such
    a amazing post with us and keep blogging... android quiz questions and answers | android code best practices | android development for beginners | future of android development 2018 | android device manager location history

    ResponderExcluir
  4. Awesome..You have clearly explained …Its very useful for me to know about new things..Keep on blogging..
    Java training in Chennai


    Selenium training in Chennai

    ResponderExcluir
  5. I am obliged to you for sharing this piece of information here and updating us with your resourceful guidance. Hope this might benefit many learners. Keep sharing this gainful articles and continue updating for us.
    motorola service center in velachery
    motorola service center in t nagar
    motorola service center in vadapalani

    ResponderExcluir
  6. Thanks for your informative article. This article is very informative for us. Thank You for this amazing knowledge.
    888 RAT Pro

    ResponderExcluir
  7. Effective blog with a lot of information. I just Shared you the link below for Courses .They really provide good level of training and Placement,I just Had Selenium Classes in this institute , Just Check This Link You can get it more information about the Selenium course.


    Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery

    ResponderExcluir
  8. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me.

    DataStage Online Training

    DataStage Classes Online

    DataStage Training Online

    Online DataStage Course

    DataStage Course Online

    ResponderExcluir
  9. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging

    Java Training in Chennai

    Java Course in Chennai

    ResponderExcluir
  10. Very interesting and it caught my attention.
    by cognex is the AWS Training in Chennai

    ResponderExcluir
  11. Bollywood News in Hindi - Check out the latest Bollywood news, new Hindi movie reviews, box office collection updates and latest Hindi movie videos. Download free HD wallpapers of Bollywood celebrities and recent movies and much more on Bollywood Hungama.
    F9 The Fast Saga Full Movie
    Heropanti 2 Full Movie

    ResponderExcluir
  12. Salesforce CRM likewise offers upgraded customization abilities to organizations as the cloud-based climate makes it simple for organizations to test any adjustments or changes prior to proceeding with conclusive execution measure. Salesforce training in Pune

    ResponderExcluir
  13. UNIQUE ACADEMY FOR COMMERCE provides FREE CSEET Video Lectures to All the Students on its YouTube Channel and UNIQUE ACADEMY Website
    cs executive
    UNIQUE ACADEMY

    ResponderExcluir
  14. Sharetipsinfo is investment and trading solutions company is leading Share Market Tips company. Sharetipsinfo provides all types of online stock trading includes best investment tips to their customers. Sharetipsinfo share trading have best Stock Market Tips for investors, equity shares, stock shares and many types of investments in share markets.

    ResponderExcluir
  15. Is AVATRADE REVIEW Scam? Can They Be Trusted? What Are The Best Brokers? Check Out Our Detailed AVATRADE Review And Get The Answers To These Questions And Much More.

    ResponderExcluir
  16. Want To Trade Forex With AVATRADE REVIEW ? Read This Blog First To Find Out About The Best Forex Trading Conditions. We Review The Most Popular Forex Brokers And Tell You What You Need To Know

    ResponderExcluir
  17. Este comentário foi removido pelo autor.

    ResponderExcluir
  18. This is very good content you share on this blog. it's very informative and provide me future related information. thnaks For Sharing

    ResponderExcluir
  19. I'm grateful to you for sharing this plenty of valuable assessment. I found this asset most extreme helpful for me. much thanks to you a lot for better do whatever it takes. Edius Video Editing Software Free Download Full Version Crack

    ResponderExcluir
  20. I could following to appreciation for the undertakings you have made recorded as a printed copy this leaflet. I'm trusting the thesame fine take diversion doings from you inside the thinking of as capably. thankful to you...
    Bitdefender Antivirus Plus 2019 Crack

    ResponderExcluir
  21. This technical post helps me to improve my skills set, thanks for this wonder article I expect your upcoming blog, so keep sharing..
    AWS Training in Chennai

    ResponderExcluir