CESWP Tech Blog

Technical notes from the CESWP Project

New Homes

without comments

We’ve decided to wind down the CESWP Tech Blog, but we’ll continue posting on two related sites: one is project-oriented, the other that is a more general cloud technology blog. Here they are:

www.ceswp.ca: This is the official blog for the Cloud-Enabled Space Weather Platform project.

www.cybera.ca/tech-radar: Cybera’s technology blog.

Written by ceswp

June 24th, 2011 at 9:27 am

Posted in Uncategorized

OpenStack Diablo Design Summit

without comments

It’s been a remarkable first day at the OpenStack Diablo Design Summit.  Here’s just a few highlights of my day:

Devin Carlen: developer of the OpenStack dashboard.  He’s looking for contributors, so I volunteered us.  I’ll set up my development environment here at the conference, so we’ll be ready to roll once I return.

Jake Dahn: awsomesauce.  Another dashboard.  GUI only, nothing under the hood yet.  His intention is that any module developed for Devin’s dashboard will plug into this new interface, so we can go ahead and add to Devin’s without too much risk.

Rob Hirschfield: crowbar.  (Love the bunny.)  Turns out, this is the guy that put together the dair hardware specs for us.  He has a whole whack of Chef scripts to do a push-button install of OpenStack, Swift, etc.  These scripts are designed to work with the Dell 6100 servers.  Check out Matt Ray’s OpsCode blog, too.

Check out Scalr.  It’s described as ‘the open source RightScale’.  We met Sebastian in San Antonio,  Scalr runs a commercial business, so it’s mature at 3-years old, and has just been introduced into the OpenStack ecosystem incubator.

I’ve arranged to get together with one of the star developers from NTT to talk about IPv6.  I’d love to see if we can set up a small test cloud running IPv6 during the conference, but that is a lofty goal, and might have to wait until I return.  In any event, a short discussion will give us a big boost.

I’ve getting high-fives from everyone when I mention we’re building a cloud, and it seems that everyone at rackspace knows about our project: ‘Oh, you’re those guys up in Canada!  How are things going, do you need anything from us?’  I attribute this profile to Soo Choi, who moves through the crowd like a good hostess, introducing people to each other, then stepping out of the conversation.

 

Written by Barton Satchwill

April 27th, 2011 at 8:44 am

Posted in Uncategorized

An Improved rc.local for Loading User-Data

without comments

Earlier, I wrote an article about how we used rc.local to load user data into our cloud virtual machines. Since that time, we’ve gotten a little more experience with the technique, and we are now able to recognise several weaknesses in the original script:

  • the attempt to inject the ssh key came last.  This means that if the script failed for any reason, the key wouldn’t get injected.  It’s smarter to do this first, so if something goes wrong, you can still log into the vm and fix it.
  • we were forcing the userdata script to run with the ‘sh’ interpreter.  Instead, we should not specify the shell, and let whatever the author of the userdata declared with shebang (#!) take precedence.  The consequence of this is that scripts that relied on the idioms of a particular shell would fail when forced to run in ‘sh’
  • the script directed wget to overwrite the log file, rather than appending to it.
  • we cheerfully ignored the fact that our user data or our ssh key might have a length of zero
  • we were’nt getting enough diagnostic information when scripts failed.
  • we weren’t capturing stderr

I’m not sure you can write a good bash script.  You can only write one that’s less awful than others.  So with that in mind, here’s our new version:

#---------------------------------------------------------------
#!/bin/bash
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#----------------------------------------------------------------------------
set -o nounset # exit if any variable is not set

BASE_URL="http://169.254.169.254/latest"
USERDATA_URL="${BASE_URL}/user-data"
METADATA_URL="${BASE_URL}/meta-data"
SSH_KEY_URL="${METADATA_URL}/public-keys/0/openssh-key"
USERDATA="/tmp/userdata.sh"
INIT="/root/.userdata-init"
LOG="/root/init.log"
AUTHORIZED_KEYS="/root/.ssh/authorized_keys"

if [ `whoami` != root ]; then
 echo "---- Please run this as the 'root' user";
 exit 1
fi

log() {
 echo $(date -R): $1 >> $LOG 2>&1
}

inject_ssh_key()
{
 # simple attempt to get the user ssh key using the meta-data service
 CURL_RESULT=$(curl -m 20 -s $SSH_KEY_URL)
 RETURN_CODE=$?
 LENGTH="${#CURL_RESULT}"
 if [ $RETURN_CODE -ne 0 ] || [ $LENGTH = 0 ]; then
   log "error retrieving ssh key"
   log "curl return code: $RETURN_CODE"
   log "curl length: $LENGTH"
 else
   KEY=$(echo $CURL_RESULT | grep 'ssh-rsa')
   grep -s -q "$KEY" "$AUTHORIZED_KEYS"
   RETURN_CODE=$?
   log " grep return code: $RETURN_CODE"
   if [ $RETURN_CODE -eq 0 ]; then
     log "ssh key found"
   else
     log "Injecting ssh key"
     mkdir -p /root/.ssh
     echo $KEY >> "$AUTHORIZED_KEYS"
   fi
 fi
}

get_userdata ()
{
 wget $USERDATA_URL -O $USERDATA --append-output $LOG --no-clobber --retry-connrefused --tries=20
 RETURN_CODE=$?

 log "wget return code: $RETURN_CODE"
 if [ "$RETURN_CODE" -gt 1 ]; then
   log "error retrieving user data"
 fi

 if [ ! -f $USERDATA ]; then
   log "userdata file $USERDATA not found"
 fi

 if [ ! -s "$USERDATA" ]; then
   log "userdata file $USERDATA is zero length"
 fi
}

run_userdata()
{
 if [ ! -f $INIT ]; then
   log "running for the first time"
   RUN="true"
 else
   log "checking for run always"
   RUN=$(grep -E "^#.*CESWP_RUN_ALWAYS" $USERDATA)
 fi

 if [ -z "$RUN" ] ; then
   log "do not execute userdata"
 else
   touch $INIT
   chmod +x $USERDATA
   log "initialized on $(date -R)"
   log "-----------------------------------------------"
   log "--------------- userdata start ----------------"
   $USERDATA >> $LOG 2>&1
   log "---------------- userdata end -----------------"
   log "-----------------------------------------------"
 fi

 rm -f $USERDATA
}

log "-----------------------------------------------"
log "------ starting cloud user-data execution -----"
log "-----------------------------------------------"
inject_ssh_key
get_userdata
run_userdata
log "------ cloud user-data execution complete -----"
exit 0

Written by Barton Satchwill

April 1st, 2011 at 5:20 pm

Posted in Uncategorized

MultiTail

without comments

One of the tools we use on the CESWP project regularly is MultiTail.  MultiTail allows you to view multiple files in multiple windows (with ncurses) on a single terminal.  Its interactive so you can filter, merge, colour, etc. the different files while MultiTail is running.  This is especially useful when tailing a lot of log files at once.

On CESWP we use it to view the logs of all of our cloud components at once.  This gives us a real time view into the state of our cloud and aids debugging problems.

Installation

Ubuntu

sudo apt-get install multitail

CentOS

Via RPMForge

  1. Follow these instructions for CentOS 5
  2. sudo yum install multitail

Via rpm

  1. Go here and copy the link of the latest rpm for RHEL5 and CentOS-5 x86 64bit
  2. wget [link]
  3. rpm –import http://apt.sw.be/RPM-GPG-KEY.dag.txt
  4. rpm -i [multitail rpm]

Configuration

No configuration needs to be done out of the box but the config file is located at:

/etc/multitail.conf

The easiest way to get started with monitoring is to use a script that does most of the work for you.  For example,

#!/bin/bash

multitail -s 2 \
 -t ceswp-cc.log -ev ".*refresh_resources\(\): called" -l "ssh -i private.key user@xxx.xxx.xxx.xxx tail -f /var/log/eucalyptus/cc.log" \
 -t ceswp-01-nc.log -l "ssh -i private.key user@xxx.xxx.xxx.xxx tail -f /var/log/eucalyptus/nc.log" \
 -t ceswp-02-nc.log -l "ssh -i private.key user@xxx.xxx.xxx.xxx tail -f /var/log/eucalyptus/nc.log" \
 -t ceswp-03-nc.log -l "ssh -i private.key user@xxx.xxx.xxx.xxx tail -f /var/log/eucalyptus/nc.log" \
 -t ceswp-04-nc.log -l "ssh -i private.key user@xxx.xxx.xxx.xxx tail -f /var/log/eucalyptus/nc.log" \

We’ve setup public/private key security on our servers so we can ssh in non-interactively.

Run

This script makes use of the following parameters:

  • -s x    vertical split screen (in ‘x’ columns)
  • -t x    display ‘x’ in the window-title (when MultiTail runs in an xterm)
  • -ev    print only when NOT matching with this regexp
  • -l x    parameter is a command to be executed

Commands

These are some useful in app commands you can use:

  • q    Quit
  • Ctrl+h    Help
  • Ctrl+g    Exit help
  • /    Search in all windows
  • b    scroll back
  • B    scroll back in ALL windows merged into one window
  • f    enter/edit in line filter regexps
  • 0..9    set a mark in a window so you can see what changed since you last looked
  • o    clear a window
  • O    clear ALL windows
  • z    hide/unhide a window
  • u    hide ALL windows but the one selected
  • U    unhide all windows
  • p    pause ALL windows
  • P    (un)pause one window

Reference

MultiTail
Manual
Examples

Written by Everett Toews

January 28th, 2011 at 11:15 am

Posted in Uncategorized

Creating an Application Wide Configuration Domain Object in Grails

without comments

The Grails application we’re creating for CESWP is maturing and we’re in need of application configuration.  To this end I thought it would be a good idea to create an application wide configuration domain object.  It would have been easier to put the configuration in a properties file or one of the Grails configuration Groovy files but that requires access to the source code to change and potentially even a restart of the server.  I wanted something that would be accessible to anyone administrating the CESWP application.

To start with I created a domain object in the domain/ceswp directory.

package ceswp

/**
 * Do not create instances of this class!  Use it via the CeswpConfigService instead.
 */
class CeswpConfig {
  public static final String CONFIG_PARAM_1 = "1"
  public static final String CONFIG_PARAM_2 = "2"

  String myConfigParam = CONFIG_PARAM_1

  static constraints = {
    myConfigParam(inList: [CONFIG_PARAM_1, CONFIG_PARAM_2], blank: false)
  }

  String toString() {
    return "[CeswpConfig: myConfigParam=${myConfigParam}]"
  }
}

There is only ever supposed to be one domain object of this type (hence the class comment).  However, trying to make a domain object in Grails a singleton doesn’t seem to be a particularly good idea (see How to support a singleton domain class in grails).

So I went the service route.  That works fine but anyone could still instantiate and save the domain object since the constructor is public.  I’d be interested to hear from anyone with a suggestion for a way around this.

The code for the service is in the service/ceswp directory.

package ceswp

import org.springframework.beans.factory.InitializingBean

class CeswpConfigService implements InitializingBean {
  static transactional = true

  private static CeswpConfig instance

  void afterPropertiesSet() {
    def ceswpConfigList = CeswpConfig.findAll()

    if (ceswpConfigList.size() == 1) {
      instance = ceswpConfigList.get(0)
    }
    else {
      throw new RuntimeException("There can only be one CeswpConfig.")
    }
  }
}

This service is implemented according to the Grails recommended way in the Services reference documentation.  The configuration is going to be reference often in the source code so I don’t want to load the CeswpConfig from the database every time.

Instead, in afterPropertiesSet() (see the Initialization section of Services), I load the CeswpConfig once.  But this means that every time the CeswpConfig object changes, this service will need to be notified so it can reload its instance of CeswpConfig.

To do that I created a custom event listener in the src/groovy/ceswp directory.

package ceswp

import org.hibernate.event.PostUpdateEventListener
import org.hibernate.event.PostUpdateEvent
import com.smokejumperit.sublog.WithLog
import org.springframework.context.ApplicationContext
import org.codehaus.groovy.grails.commons.ApplicationHolder
import org.springframework.beans.factory.InitializingBean

@WithLog
class CeswpConfigListener implements PostUpdateEventListener {
  void onPostUpdate(PostUpdateEvent postUpdateEvent) {
    if (postUpdateEvent.getEntity() instanceof CeswpConfig) {
      ApplicationContext ctx = (ApplicationContext) ApplicationHolder.getApplication().getMainContext();
      InitializingBean ceswpConfigService = (InitializingBean) ctx.getBean("ceswpConfigService");

      ceswpConfigService.afterPropertiesSet()
    }
  }
}

This class will receive PostUpdateEvents from all domain objects.  To narrow it down to only what we’re interested in I simply check to see if the PostUpdateEvent is an instanceof CeswpConfig.  To get the service object I used the technique from the “From a servlet or other non-artifact class in src/groovy or src/java” section in the Services reference documentation.  I managed to avoid some extra code by using the fact that the CeswpConfigService implements the InitializingBean interface, which already has the method that I’m interested in calling.

To wire up the listener to the event all that we need to do is edit conf/spring/resources.groovy like so.

import org.codehaus.groovy.grails.orm.hibernate.HibernateEventListeners

// Place your Spring DSL code here
beans = {
  ceswpConfigListener(ceswp.CeswpConfigListener)

  hibernateEventListeners(HibernateEventListeners) {
    listenerMap = ['post-update':ceswpConfigListener]
  }
}

The only other thing I did was to create a CeswpConfig object in conf/BootStrap.  After that you have an application wide configuration domain object that you can change from the standard Grails editing screens.

Written by Everett Toews

December 10th, 2010 at 3:59 pm

Posted in Uncategorized

Tagged with

Dark Cloud

without comments

We haven’t been entirely happy with Eucalyptus.

The truth of the matter is that even now, nearly a year after starting with Eucalyptus, we still find ourselves spending most of our time futzing around with it, trying to fix yet another problem, trying to recover from yet another failure.   In short, we have lost too much valuable time working on the cloud, rather than working in the cloud.  Things are at the point where we feel we have an obligation to the community to be honest about our experiences with Eucalyptus, and list some of the problems we’ve had.  In case this appears as though we are bashing Eucalyptus, it must be pointed out that we have discussed everything here with Eucalyptus, either through email, forums, or telephone conversations.

  • Poor stability.  We have suffered every kind of cloud failure you can imagine: running vm instances vanish, instances loose their i.p. addresses, volumes disappear, clusters loosing nodes, web consoles falling down, the list has been endless.
  • Brutal recovery from problem states.  When trying to recover from some problem, we spend a lot of time trying to recover gently without doing any additional damage to our cloud.  These attempts typically fail, and we are reduced to a full restart of our cloud servers, resulting in the total loss of our cloud.  To be fair, the situation has improved slightly with the latest 2.0.1 patch to Eucalyptus, but recovery is still unreliable, and rarely complete.  The thought of going into a real production environment in this state is terrifying.
  • Low quality information in log files.  Eucalyptus produces enormous log files, filled with useless, repetitive messages.  Worse, they are filled with misleading information, such as exceptions thrown as a part of normal operations, and error messages that are not actually errors.  These are cardinal sins in any application log.  Log files like these certainly complicate the work in reporting a defect, as it is extremely difficult to describe the problem coherently.  We were amused to discover that not even the Eucalyptus engineers can use the log files to diagnose a problem.
  • Information vacuum.  While the installation and administration guides are sufficient to get a cloud up and running, there is no detailed technical information available.  Understanding exactly how the Eucalyptus network functions, or how Eucalyptus uses information in its database must come from black box experimentation or rolling up your sleeves and reading the source code.
  • No user community.  It feels awfully lonely out here, using Eucalyptus, and there’s no sign of Eucalyptus doing anything to build a community.  We can see that individual users post questions, but I get the impression that these people are just taking Eucalyptus for a test drive, or using it experimentally.  I have often found myself wondering why Eucalyptus doesn’t have a prominent page listing projects currently using Eucalyptus.  We would happily add our project to such a list.
  • Expensive support arrangements. The lowest priced plan they offered us had rather lukewarm support for $1,500.00 per server per annum!  In the words of our principal investigator, ‘that’s too rich for science!’
  • No development roadmap.  In all of our repeated attempts to find out where Eucalyptus is going, or what features might be added in the future, Eucalyptus has been evasive in answering.  The impression we get is not that they don’t want to share the information, it’s more like they simply don’t have a plan for the open source version.
  • User forums of dubious value.  The typical thread is noisy, unfocused, and inconclusive.  We rarely see any solutions posted, even as a conclusion to the reported problem.  To be fair, this isn’t entirely the fault of Eucalyptus, they can’t be held responsible for what users submit.  But Eucalyptus could certainly help out by properly moderating the forums, organising and pruning some of the threads, and promoting solutions to common problems.
  • No response to reported issues or defects.  In our early days, we dutifully reported defects as we discovered them, but as the weeks and months passed with no response of any kind, we became disappointed with the process.  We’ve pretty much given up trying, and suspect other users would feel the same way.  (Heh, I just noticed that the official bug tracking page in Eucalyptus apparently doesn’t accept defect reports from the current 2.0 release.  Only the previous 1.6 release, and the nightly builds!  Why bother?)

In the end, it feels like Eucalyptus is a project that was not ready to become open sourced.  The Eucalyptus software is too immature to be used for any purpose outside a lab environment.  The Eucalyptus company lacks the organisation and processes to foster a community and support its software, and is either uninterested or unable to move forward.

Written by Barton Satchwill

December 1st, 2010 at 9:44 am

Posted in Uncategorized

CloudCamp Edmonton

without comments

CloudCampOn Monday night I attended CloudCamp Edmonton.  From the link,

“CloudCamp is an unconference where early adopters of Cloud Computing technologies exchange ideas. With the rapid change occurring in the industry, we need a place where we can meet to share our experiences, challenges and solutions. At CloudCamp, you are encouraged to share your thoughts in several open discussions, as we strive for the advancement of Cloud Computing.”

There was a good turn out of about 70 people.  After the welcome and introductions they moved on to the Lightning Talks (5 minute presentations).  A number of the talks were quite interesting.

  • Costs, Billing and Chargebacks for Cloud Computing – Rob Bisset, 6fusion.com: Rob talked about 6fusion Infrastructure as a Service (IaaS) offering that they were delivering to Canada through their partnership with e-ternity.ca.  It was nice to see an IaaS provider setting up in Canada but it was unclear how to take advantage of their cloud offering or take it for a test drive.
  • Cloud Computing:  Some Perspectives and Initiatives from Academia – Prof. Paul Lu: Paul highlighted some of the cloud initiatives going on at the University of Alberta.  For Software as a Service (Saas) he talked about the U of A’s switch to Gmail, for Platform as a Service (PaaS) he talked about some of his students work with Google Fusion Tables and for IaaS he talked about the Nahanni project, which is a shared memory interface for KVM.
  • Microsoft’s Cloud Computing Strategy – Barnaby Jeans: Barnaby talked about Microsoft’s PaaS offering Azure.
  • Scaling an App to Multiple Instances – Sean Ouimet: Sean discussed his company’s efforts to scale their application across multiple instances on Amazon Web Services (AWS).

However, it was clear that most presenters brought a 15-20 minute presentation and merely tried to condense their talks down to 5 minutes.  I think they really missed an opportunity to take advantage of the Lightning Talk format and focus their presentations instead of trying to cram as much information into 5 minutes as they could.

After that I participated in the unpanel, where 5 of us sat in front of the audience and took questions.  A lot of people were concerned about data and privacy in the cloud.  Specifically talking about hosting data in the cloud with companies based in America and the Patriot Act.  Barnaby from Microsoft replied that there are a lot of things to take into consideration when a Canadian company wants to host data in an American based cloud and that Canadians shouldn’t disregard American clouds just because of the Patriot Act.  To which Rob from 6fusion retorted that Canadians should just host their data in a Canadian based cloud with 6fusion and not have to worry about the Patriot Act.

I tackled the question about the disadvantages of the cloud.  Regarding private clouds, I talked about difficulty of providing the “elastic” in elastic computing.  One of the reasons AWS excels is that it is very easy and quick to acquire a lot of computing and storage resources.  While the CESWP cloud can do the “very easy and quick” part, we could easily have a problem with the “a lot” part.  It will be feasible for one person to come in and consume practically all of the resources of the cloud in one shot leaving nothing for anyone else.  This is basically the problem we outlined in If We Succeed, We Fail.

We then moved on to the break out sessions.  I attended the more technical of the two sessions and we discussed a lot of the issues with the cloud.  Among the topics were:

  • dynamically scaling applications using tools such as scalr
  • deployment and configuration options for virtual machines (VMs) in AWS such as bundling and user-data scripting
  • monitoring your VMs
  • some of the success stories like the NY Times using Hadoop on AWS to convert 11 million articles in just 24 hours

I also had a chance to meet some interesting people during the conference.  There was a Drupal developer from Auckland, NZ who was just trying to wrap his head around all things cloud computing.  I talked to a .NET developer who was in the habit of putting 4+ GB heap dumps on Amazon’s Simple Storage Service to easily share them with his colleagues.  The cost of sharing such large files was so cheap that he didn’t even bother to expense it.  Finally, I chatted with Sean about his work scaling applications in the cloud.

All in all it was an interested evening and it was a great opportunity to get to know what the software development community in Edmonton is thinking about the cloud.

Written by Everett Toews

October 7th, 2010 at 10:32 am

Posted in Uncategorized

Eucalyptus User-data

without comments

We thought it would be great if we could pass a script into a vm instance the first time it was started, and have that script execute automatically.  In this way, a more-or-less generic machine image could be customised to a users needs. User accounts could be created, new packages could be installed, and services could be configured.

Happily, this is exactly the way user-data works in the Alestic Amazon machine images.  Eucalyptus also describes this ability, suggesting rc.local be modified to fetch and execute the user-data.  Unfortunately, this mechanism did not work for us, until we first solved a couple of problems.

The first problem was that the rc.local script in the ubuntu 10.04 instance we were using passed the ‘-e’ flag to the shell command interpreter.  This meant that the script terminated (silently!) on the first error encountered.  This meant that the script exited before we got a chance to do any error handling, and it was difficult to know whether or not the script was even executing.   We chose to not use this flag, and to rely on our own error handling instead.

The next problem we discovered was that curl or wget was failing because the network was not ready when the rc.local ran.  This problem was resolved by configuring wget to re-try if an error is encountered, even if the connection is refused.  The graphic below shows rc.local completing before the network interfaces have come up.

ubuntu boot process

With those problems out of the way, the mechanism seems to be working well.  Our version of rc.local, seen below, allows the option of the user-data script to be run just once, on the initial creation of the vm instance, or to be run each time the vm boots.  Notice also the –retry-connrefused  and –tries flags critical to getting wget working reliably.

#----------------------------------------------------------------------------
#!/bin/sh
#
# rc.local
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution bits.
#---------------------------------------------------------------------------- 

# get and execute the Eucalyptus user-data
USERDATA="/tmp/userdata.sh"
INIT="/root/.userdata-init" LOG="/root/init.log"

echo "---------------------  $(date -R) -----------------" > $LOG
wget http://169.254.169.254/latest/user-data -O $USERDATA -o $LOG \
      --no-clobber --retry-connrefused --tries=10
# exit if wget failed if [ $? -ne 0 ]; then
  echo "$(date -R): error retrieving user data" >> $LOG
fi 

# exit if user data file not found
if [ ! -e $USERDATA ] ; then
  echo "$(date -R): $USERDATA not found" >> $LOG
  exit
fi 

if [ ! -f $INIT ]; then
  RUN="true"
  echo "$(date -R): running for the first time" >> $LOG
else
  RUN=$(grep -E "^#.*CESWP_RUN_ALWAYS" $USERDATA)
  echo "$(date -R): checking for run always" >> $LOG
fi 

if [ -n "$RUN" ] ; then
  touch $INIT
  chmod +x $USERDATA
  echo "initialized on $(date -R)" >> $LOG
  echo "-------------------------------------------------" >> $LOG
  sh $USERDATA | tee >> $LOG
fi
exit 0
#----------------------------------------------------------------------------

Written by Barton Satchwill

September 28th, 2010 at 1:38 pm

Posted in Uncategorized

Tagged with , ,

Control the Cloud Programmatically with Java

with 7 comments

Summary

The cloud is rapidly becoming an essential tool that allows organizations, entrepreneurs and scientists to innovate without a large, upfront cost for computing and storage infrastructure.  Anyone can start virtual machines (VMs) in the cloud and use them to create the next critical business application, killer web application or simulation to predict hurricanes.  To support that innovation this tutorial is for Java developers looking to control both a Eucalyptus cloud and the Amazon Elastic Compute Cloud and access the virtual machines within them programmatically using Java APIs.

Objective

At the end of this tutorial you should be able to:

  1. start a Linux virtual machine
  2. create storage and attach it to the VM
  3. make a filesystem storage and mount it in the VM
  4. do something interesting within the VM
  5. detach the storage and delete it
  6. terminate the VM

using Java code.

Overview

When developers want to harness the full capability of the cloud (in the IaaS sense of the word) they need to be able to control it programmatically from within their applications.  When it comes to Java there are a number of options available to developers for choosing an API that will allow them to control multiple cloud providers.  There are:

Our requirements for the CESWP project are to support a Eucalyptus cloud and the Amazon Elastic Compute Cloud (EC2).  typica has initially been selected because it supports both clouds out-of-the-box.

jclouds support for Eucalyptus is only in the nightly snapshot builds.  A little too bleeding edge for me and there is no documentation on using jclouds with Eucalyptus.  Neither Deltacloud nor Dasein explicitly support Eucalyptus.  Although Eucalyptus claims to be compatible with the EC2 API I’ve found that unless it’s documented, tested and officially supported by the API then it probably won’t work at all.  libcloud (a Java port of the python libcloud) is still in the Apache incubator but it looks promising.

To this end I’ve begun a Java Cloud (Agnostic) API Survey that we may add to, if we evaluate and experiment with other APIs.

What follows is a tutorial on using typica to control virtual machines and their associated storage in the cloud.  One final piece of the puzzle for controlling VMs in the cloud is an SSH library to run commands directly on the VMs.  For this I’ve used the excellent sshj.  The relevant versions used in this tutorial are:

  • typica 1.7.2
  • sshj 0.3.0
  • Eucalyptus 2.0.0

The code in this tutorial is not production ready.  The code is only here to illustrate control of the cloud.  Most error handling has been omitted to make the code clearer and is left as an exercise for the reader.

Before You Start

Account(s) needed for the tutorial:

One or both of the accounts below will be need for the tutorial.

Eucalyptus

Cost: Free

Most people won’t have the time, resources or inclination to build their own Eucalyptus cloud.  Instead I suggest you apply for an account on the Eucalyptus Community Cloud (ECC).  Follow the instructions to get an account.  It may take a day or longer to get your account approved so start as early as possible.  Once you have your account follow the guide Getting Started Using Eucalyptus 2.0 guide.  Important! Make a note of the key name and where you saved the private key when creating your key pair.  Do not skip the euca-authorize step!

Note that you cannot create storage and attach it to the VM in ECC.  It’s a limitation Eucalyptus put into this particular cloud to prevent abuse because the accounts are free.

Amazon Elastic Compute Cloud

Cost: Get your credit card out (but prices are very low for short lived VMs like we’ll be using)

Simply follow the directions in the Get Started with EC2 guide.  Important! Make a note of the key name and where you saved the private key when creating your key pair.  As part of this tutorial when you’re in the AWS Management Console click on Security Groups.  Choose default and in the window below add the Connection Method SSH and click Save.

Software needed to run through the tutorial:

Tutorial

Get and Compile the Code

The code for this tutorial is hosted at http://code.google.com/p/typica-tutorial.  You can browse the code online starting at http://code.google.com/p/typica-tutorial/source/browse/#svn/trunk/src/org/cybera.

Eclipse

It’s done as an Eclipse project so for those of you using Eclipse the quickest way to get started is:

  1. Fire up Eclipse
  2. File > New > Other > type “svn” > Checkout Projects from SVN > Next
  3. Create a new repository location > Next
  4. Select the URL > Finish

This will checkout and compile the code for you.  Done.

Non-Eclipse

To get and compile the source code run the following commands:

  1. svn co http://typica-tutorial.googlecode.com/svn/trunk/ typica-tutorial
  2. cd typica-tutorial
  3. mkdir bin
  4. javac -d bin -cp “lib/*” src/org/cybera/ssh/* (on Windows change the / to \)
  5. javac -d bin -cp bin:”lib/*” src/org/cybera/TypicaTutorial.java (on Windows change the / to \ and : to ; )

Configuration

Now we need to setup the configuration files that contain the cloud specific parameters for Eucalyptus and EC2.  Open both typica-tutorial/conf/aws.properties and typica-tutorial/conf/euca.properties.

aws.properties

  • cloud.accessId – Look at your AWS Security Credentials and it’s the one labelled Access Key ID
  • cloud.secretKey – Look at your AWS Security Credentials and it’s the one labelled Secret Access Key
  • cloud.username – Look at your AWS Security Credentials and it’s the one labelled AWS Account ID (near the bottom of the page).  Important! Remove the hyphens!
  • launchConfig.imageID – The ID of the Amazon Machine Image (AMI) you want to run.  You can use ami-ff6c8596, it’s a CentOS 5.5 image.  A good place to find more images is the cloud market.
  • launchConfig.availabilityZone – The zone where your AMI will run.  You can set this to “us-east-1d” (no quotes).
  • launchConfig.keyName – The key name from when you got started with your Amazon account in the Before You Start section.
  • launchConfig.minCount=1 – the minimum number of VMs to run.  Leave it as 1.
  • launchConfig.maxCount=1 – the maximum number of VMs to run.  Leave it as 1.
  • attachVolume.deviceName=/dev/sdf – This is the Linux device you’ll attached your storage to.  Leave it as /dev/sdf.
  • vmAuth.username – The username you’ll use to login to your VM.  If you use the suggest AMI above set this to “root” (no quotes).  If it was an Ubuntu image you’d most likely use “ubuntu”.
  • vmAuth.privateKeyLocation – The absolute path to the private key from when you got started with your Amazon account in the Before You Start section.
  • vmAuth.sudo – Set this property to “sudo” (no quotes) if your username is anything other than root.

euca.properties

  • cloud.accessId – Look at your ECC Credentials and it’s the one labelled Query ID
  • cloud.secretKey – Look at your ECC Credentials and it’s the one labelled Secret Key
  • cloud.URL – Set this property to “ecc.eucalyptus.com” (no quotes)
  • cloud.username – The username you used to sign up with ECC
  • launchConfig.imageID – The ID of the Eucalyptus Machine Image (EMI) you want to run.  You can use emi-9ACB1363, it’s a CentOS 5.3 image.  The place to find more images is the ECC Images.
  • launchConfig.availabilityZone – The zone where your EMI will run.  Set this to “open” (no quotes)
  • launchConfig.keyName – The key name from when you got started with your Eucalyptus account in the Before You Start section.
  • launchConfig.minCount=1 – the minimum number of VMs to run.  Leave it as 1.
  • launchConfig.maxCount=1 – the maximum number of VMs to run.  Leave it as 1.
  • attachVolume.deviceName=/dev/vdb – This is the Linux device you’ll attached your storage to.  Leave it as /dev/vdb.
  • vmAuth.username – The username you’ll use to login to your VM.  If you use the suggest AMI above set this to “root” (no quotes).  If it was an Ubuntu image you’d most likely use “ubuntu”.
  • vmAuth.privateKeyLocation – The absolute path to the private key from when you got started with your Eucalyptus account in the Before You Start section.
  • vmAuth.sudo – Set this property to “sudo” (no quotes) if your username is anything other than root.

Explore the Code

Note that the source and Javadoc of the primary dependencies typica and sshj are included in typica-tutorial/lib/sources so you can see what’s going on under the hood.

TypicaTutorial.TypicaTutorial()

Setup all of the properties.

TypicaTutorial.start()

Top level method to start all of the other methods.

TypicaTutorial.describeImages()

Calls Jec2.describeImagesByOwner() which returns all of the AMIs/EMIs that the owner has created.  This call won’t actually return anything for you because you haven’t created any images but it’s a simple call to make that verifies connectivity to the cloud.

TypicaTutorial.runInstance()

Calls Jec2.runInstances() which runs an instance of a virtual machine based on the launch configuration specified by the properties you changed in the Configuration section above.  It also tests the state of the instance to determine when it’s running and waits and extra 30 seconds to give the SSH server on the VM a chance to startup too.  This way the method won’t return until you have a fully functioning VM.

TypicaTutorial.createAttachAndMountVolume()

Calls Jec2.createVolume() to create some storage for the VM (unfortunately this doesn’t work on the Eucalyptus Community Cloud).  Waits for the storage to be ready and the calls Jec2.attachVolume() to attach that storage to the VM.  Waits for the storage to be attached and then runs the commands via SSH to make a filesystem and mount the storage in the VM.

TypicaTutorial.doInterestingStuff()

This is where you come in.  Right now the tutorial just runs a simple ping command over SSH.  Change the command and recompile!  Experiment what you can do with a complete virtual machine at your disposal.  Run multiple commands over during one connection like is done when the storage is mounted.  For example, install subversion (yum install -y subversion), download some code from the web, compile and execute it.

TypicaTutorial.unmountDetachAndDeleteVolume()

First unmounts the storage so it can be detached without corrupting the storage (even though were just going to delete it anyway).  Calls Jec2.detachVolume() to detach the storage from the VM and waits for it to detach.  Calls Jec2.deleteVolume() to delete the volume but doesn’t wait for it to be deleted because this can take a while and we’re not dependant on the storage being deleted.

TypicaTutorial.terminateInstance()

Calls Jec2.terminateInstances to kill the running VM but doesn’t wait for it to die because this can take a while and we’re not dependant on the VM being terminated.  One thing to note is that the call to terminate the instance goes through but it looks like Eucalyptus returns a null value and typica chokes on it so we ignore the Exception (see http://code.google.com/p/typica/issues/detail?id=105).

Run the Code

Note that it can take a long time for VMs to start running or for storage to be created so be patient with the process.

Eclipse

  1. Open the file typica-tutorial/src/org/cybera/TypicaTutorial.java
  2. Run > Run Configurations…
  3. Select Java Application and press the New launch configuration button
  4. The Name (TypicaTutorial), Project (typica-tutorial) and Main class (org.cybera.TypicaTutorial) should all be filled in since your editor was on the TypicaTutorial.java file
  5. Switch to the Arguments tab and type “euca” (no quotes) in the Program arguments textfield
  6. If you’re ready to go click Run, otherwise click Apply and Close and you can run it later.
  7. For EC2 do steps 1-6 above but in the Arguments tab type “aws” (no quotes)

Non-Eclipse

  1. cd typica-tutorial
  2. For Eucalyptus: java -cp conf:bin:”lib/*” org.cybera.TypicaTutorial euca (on Windows change the / to \ and : to ; )
  3. For EC2: java -cp conf:bin:”lib/*” org.cybera.TypicaTutorial aws (on Windows change the / to \ and : to ; )

References

Future Work

Building a library on top of typica for coarser grained functionality (e.g. a single method to start a VM and attach and mount X GBs of storage).

Try different Java Cloud APIs and add our findings to Java Cloud (Agnostic) API Survey.

Try different cloud management software (e.g. OpenStack).

Experiment with Java APIs that control storage in the cloud (e.g. S3 on AWS and Walrus on Eucalyptus) and write a similar tutorial for that.

Written by Everett Toews

September 17th, 2010 at 5:28 pm

Posted in Uncategorized

Troubleshooting Eucalyptus

without comments

We’ve run into a number of problems while installing, configuring and using Eucalyptus.  So we’ve decided to start our own Eucalyptus Troubleshooting Guide.  It’s a Google Doc we plan to keep updating every time we encounter (and solve) a problem.

Written by Everett Toews

June 21st, 2010 at 11:18 am

Posted in Uncategorized