Categories

Countdown is on: CESWP to open to users

The CESWP project, managed by Cybera, has completed its test phase and will be accepting user applications in January. Launched in beta form in October 2009, the CESWP was designed to simplify access to powerful compute resources for space weather modelling and simulations. The platform shields users from all of the associated complexities by integrating compute resources into a common platform through the use of open-source cloud computing software, and a remotely accessible web portal.

A virtual demonstration of the CESWP cloud was held on November 29 at the University of Alberta and webcast to remote participants. The demo allowed for researchers from across North America and beyond, and future users of the platform, to learn about the user-friendly capabilities of the platform’s advanced collaboration and modelling and simulation tools.

“I was pleased that my expectations for the cloud were confirmed; that I would be able to use a resource for modelling that allowed me to choose my operating system, my compiler, and even the configuration of the machines,” said Clare Watt, Research Associate in the Department of Physics at the University of Alberta, who attended the virtual demo. Watt also contributed to the development and testing of the platform.

“I was also pleased that it would be a fairly straightforward process. For example, I would be able to read about modelling instances that already existed, I would be able to create my own models, and I would be able to choose from a variety of ways to set up a machine that ranges in complexity. We run many different simulations that range from very simple models, which can be done in three seconds, to large-scale simulations that take weeks, and all of these models and simulations require different computer setups. The strength of the cloud is that each scientist can actually tailor their machine depending on the type, or complexity, of the simulation.”

Researchers interested in using the CESWP are asked to apply for an account here.

To view the CESWP close-out media release, click here.

Apply for a free CESWP account

For those of you interested in using the platform to collaborate with other space scientists or run specific models or simulations, please click here to apply for a free account. Applicants will be notified after December 31, 2011, at which time you will be able to access the portal.

 

CESWP Presents at the AGU

Robert Rankin will be doing an oral presentation at the Fall AGU Meeting 2011 in San Francisco

Date: Monday, December 5, 2011
Presentation Time: 2:14 PM – 2:29 PM (PST)
Location: Moscone South, Room  308

Title:
SM13E-03. Federated and Cloud Enabled Resources for Data Management and Utilization
Robert Rankin; Mark Gordon; Roddi G. Potter; Barton Satchwill

Abstract:
The emergence of cloud computing over the past three years has led to a paradigm shift in how data can be managed, processed and made accessible. Building on the federated data management system offered through the Canadian Space Science Data Portal (www.cssdp.ca), we demonstrate how heterogeneous and geographically distributed data sets and modeling tools have been integrated to form a virtual data center and computational modeling platform that has services for data processing and visualization embedded within it. We also discuss positive and negative experiences in utilizing Eucalyptus and OpenStack cloud applications, and job scheduling facilitated by Condor and Star Cluster. We summarize our findings by demonstrating use of these technologies in the Cloud Enabled Space Weather Data Assimilation and Modeling Platform CESWP (www.ceswp.ca), which is funded through Canarie’s (canarie.ca) Network Enabled Platforms program in Canada.

Session Title: SM13E. Discovery, Access, and Analysis Tools in Solar and Space Physics Research I
Session Time: 1:40 PM – 3:40 (PST)

If you are at the AGU, please come and listen to Dr. Rankin’s presentation

Virtual Demo Logistic Details

If you have not RSVP’d yet please click on this link to register.

If you have already registered, please note the connection details below.

Date:
Tuesday, November 29, 2011
Time: 10:30 AM – 12:00 PM MST

Location:
In person:
Room 315, General Services Building, University of Alberta, Edmonton, AB

Remotely:
To connect remotely you will need to log on to both WebEx and Teleconference (see the details below).
Please ensure that you are using either a Windows or Mac OS (WebEx does not work on a Linux machine).

WebEx

  1. Go to https://cybera.webex.com/cybera/j.php?ED=165497002&UID=1229562932&PW=NOTBiMTMwMzRj&RT=NCM2
  2. If requested, enter your name and email address.
  3. If a password is required, enter the meeting password: cloud
  4. Click “Join”.

Teleconference
Toll-free: 1-866-518-0791
Conference ID: 215559

Virtual Demo

Researchers, students and R&D teams studying space science or space weather are invited to join us for a virtual demonstration of the CESWP cloud. You will:

  • Receive a general overview of the CESWP
  • Learn how to collaborate over distance with space scientists around the world
  • Learn about the specific modelling and simulation capabilities of the portal
  • Lean how to develop your own models using the CESWP cloud

When: November 29, 2011
Time: 10:30-12:00pm MST

Click Here to Register here

Thinking About Clusters in the Cloud

We’ve been thinking for some time about the cloud, and trying to get beyond all the hyperbole to discover what the cloud is really good for, and how it is different from things like grid computing and high performance computing.  So I thought I would share some of our thoughts on how we might be able to use the cloud the create a high-throughput compute cluster, with both an interactive interface, and a batch interface.

Now, I said ‘batch, and that might strike some of you as a bit odd.  It was certainly odd to us.  A cloud is all about ‘on-demand’ services, what in the world is an ‘on demand batch’?  For a long time, we dismissed such notions as fuzzy-headed thinking, a confusion of ideas.

But requests for this sort of service were persistent, and could not be ignored.  The stated motivation for a batch interface was simply that it was what people were accustomed to using, but of course, it goes a bit deeper than that.  The models our scientists use were designed to run on batch systems, and to convert them to interactive systems would involve significant work for very little gain.  Thus, not having a batch interface would present a significant barrier to adoption of cloud computing.

At some point, we realised that a strictly on-demand cloud presented another serious problem:  in an on-demand system, if resources are not available when you want them, you’re out of luck.  If your demand cannot be satisfied immediately, it’s simply rejected, and you get nothing at all.  But a batch interface would allow the system to do a large job in small chunks, as resources become available.  This is especially important when your cloud is small, with limited resources

The ability to have jobs –not users– wait in line for service became very valuable to us, and fit nicely with one of our guiding principles: we value the scarce resources of the user more than the scare resources of the computer.

Very well, our cloud should have a batch interface, it must have a batch interface.  What should it look like?  What should it do?  Well, what does a batch system in a grid computing centre look like?

  • it’s a batch system, with terminal access.
  • it’s asynchronous.  users submit their work, then walk away, they don’t expect instant results
  • It resources are shared by many users

And what are the essential characteristics of a cloud?

  • it’s on demand
  • it’s interactive
  • it’s synchronous.  I push a button, I expect to get something back
  • aside from the user, there’s no humans involved
  • resources are owned by the user

And how do we combine the two?  Let’s imagine an easy case: a scientist wants to do a parameter sweep, whereby the same model is executed many times with slightly different parameters.  Each instance is entirely independent, and they can be run in any order.  Well, this is pretty easy, because we realise that we only need 1 virtual machine at a time.   We simply run the first job, and when it completes, we run the next, and so on.  Of course, if we happen to have 2 machines, then the whole proccess will go twice as fast, and if we happen to have 10 machines, then we’ll really be flying.  In any event, sooner or later, all of the jobs will be completed, and the scientist can be informed via email.

Impatient scientists might also want a web interface to check on the progress of his batch at any time, so we’ll need to include that.

And what about the more complicated case, where the batch of jobs are inter-dependent?  We’re in a tight corner, here, because all of the jobs have to run at the same time.  But we still have a little wiggle-room.  If we don’t have enough resources to satisfy our scientists demands, we can offer the opportunity to scale down the simulation until it will fit in our available resources.  The important point that makes this option viable is that is it interactive, the user has the information they need to make an informed choice.  They know right away what they’re getting.

But what’s the point of using the cloud for my MPI model if I can’t get the vast amount of resources a full-scale, high-resolution simulation requires?  The truth of the matter is that most times during the development of a model, simulations aren’t done at high-resolution, they’re only done at low-resolution, or on just a few dimensions.  Vast resources aren’t required.  Using the cloud for small- or mid-sized simulations frees up the scare resources of the grid centres for the mamoth scale models cannot be run elsewhere.

Cloud is grid’s best friend!

So, it seems like batch jobs and clusters in the cloud isn’t so crazy after all.

Clusters in the Cloud

In an earlier post, we talked about a very simple form of clustering, little more than a collection of machines working separately on independent tasks. This form of clustering is suitable for work that is trivially parallel, like a parameter sweep. Our initial work to support this form of clustering provided very simple mechanisms for deploying the parametrised model to the group of machines, and for gathering the results together.  While this pilot work provided us with a much better understanding of the problem space, it was not very flexible, and was missing many of the convenient features scientists enjoy in a mature grid computing facility.

For example, the ability to monitor the progress of a simulation, and email notification of when a simulation was complete were sorely missed.

The more we worked with the pilot version of our cluster, the more we thought about what could be added: Monitoring the simulations lead us to think about the need to abort a simulation that had gone off the rails.  Thinking about aborting one –or all– of the simulations made us think about a way to easily re-launch a parameter sweep after the initial inputs had been corrected.  Which lead us to think about the ability to re-launch a parameter sweep that had been performed in the past, which lead us to think about an historical archive of past simulations and their results.

The historical archive was especially interesting.

Our sister project, the Canadian Space Science Data Portal, catalogues data products from instruments and observatories, and allows that catalogue to be searched.  We realised that we could easily treat the data products of a simulation in exactly the same way,  allowing researchers from around the world to discover computational models that explored physics relevant to their own work.

A Bunch of Machines

Just a Bunch of MachinesUntil now, all of our work with virtual machines in the cloud has been with single machines working independently. Now we have started to explore groups of machines, working together. In contrast to systems like MapReduce, where a single problem is broken down into separate tasks and distributed across several machines, our recent work involves machines that work independently on variations of the same, large problem. In this way, the response of a model across a large parameter space can be studied.

We have been careful to avoid the use of the word ‘cluster’ to describe these groups of machines. A ‘cluster’ of computers is usually implies tightly-coupled machines, with dedicated high-speed interconnects. Our machines are only loosely coupled by rather ordinary networks. It is better to think of our groups as ‘Just a Bunch of Machines’, or as we like to refer to them, JaBoM.

Much of the machinery used to provision single machines has been reused to provision groups of machines.  New work has been done to establish convenient mechanisms for each machine to communicate securely with each other, and to distribute initialising parameters to each machine. This work has been interesting, as it forces us to deal with several issues of identity and security in the cloud. The basic sticking point is that in order for our system to do work on behalf of a cloud user, it needs to assume the identity of that user. How to do this in a way that balances security and convenience will be the subject of future work.

Now that the raw, basic machinery is in place for running models on groups of machine, more work must be done to make it convenient.  Such work will need to include a way of building a matrix of parameters, distributing those parameters to each machine, monitoring the progress of the simulations and reporting back to the user, and gathering results together. This automation will make it easier for scientists to conduct a parameter sweep, but are not mere frills and luxuries. When dealing with dozens, or even hundreds, of machines in a group, then even the simplest operation becomes too onerous to be done manually. We will need to have automation in place. Our scientists have been extremely helpful in guiding our understanding of the problems, and explaining their needs.

Early results of this work are encouraging. There have been no difficulties running Clare Watt’s model software on the cloud machines, the model runs nearly twice as fast on the cloud machine than on the local hardware used by Clare, and we are able to support a broader sweep of the parameter space than was previously possible. With proper support and automation in place, we are optimistic that these results will scale well with larger groups of machines.  It’s exciting to imagine getting a year’s worth of experimental results in just a few weeks.

Then the difficult task of analysing those results will begin.

Back at the AGU

This was our second year at the American Geophysical Union Fall meeting, and we’ve come a long way. Last year we were just getting started on the project: we were trying to meet all of the Virtual Organization members, trying to understand the Use Cases that would really make a difference for scientists, trying to learn enough about Space Physics to be able to better understand our users’ needs, and trying to see what other projects were doing along these lines. We even brought along a small demonstration of a space physics simulation running in various modes in the cloud on Amazon Web Services (AWS) to help get the idea across to scientists when we talked to them. It was an incredibly valuable experience, and much of what we learned at the AGU last year has fueled our work this year.

Okay, flash forward to this year’s AGU meeting: same place (Moscone Center, San Francisco), many of the same people, but a massive difference in our project and activities. Let’s walk through some of them:

  1. Demonstrating the CESWP App on our iPads. As we walked around the conference, we were able to show people how to use the CESWP cloud for collaboration right there on our iPads. This is something that Barton and Everett managed to get in place for the CANARIE Users’ Workshop a month or so ago, and it has proven to be extremely valuable for showing people how CESWP works. Just open up a browser to cssdp.ca (the Canadian Space Science Data Portal, our sister project), sign in, click on the CESWP link, and there’s the app. Then select ‘Collaborate’, tap through the steps in the wizard, hit ‘Start VM’ and voila, you and each of your collaborators get an email telling you how to access your new Virtual Machine in the CESWP cloud. Then start up iTeleport on the iPad, copy in the address of your new VM, and it teleports you through to the desktop of your virtual machine, full graphical user interface and all. Now you can get to work!
  2. Co-chairing and presenting Research Clouds sessions. Robert Rankin and John Shillington teamed up with Todd King (UCLA) and Bob Weigel (George Mason University) to run both an oral and a poster session on the use of cloud computing in scientific research. The session was well attended and we had interesting presentations from David DeRoure (Oxford eResearch Centre), Robin Winsor (Cybera) and others. Barton Satchwill, one of our senior developers, gave an illuminating (both in the sense of enlightening and in the sense of pictorially illustrated) presentation on CESWP, and Everett Toews (our other senior developer) told the audience about our experiences at the OpenStack Design Summit.
  3. Face-to-face Virtual Organization Meeting. On Tuesday evening we held a Virtual Organization meeting together with the CSSDP team. Sixteen people attended the meeting. Rather than recapping our CESWP report, you can see for yourself. At the end of the meeting, Todd King said, “Very impressive–you’ve made great progress.” Some of our progress has been made thanks to Todd and Ray Walker (UCLA), and also to Professor Q.-G. Zong (Peking University) and others who have helped us get our national and international cloud footprint well underway, not to mention all the help we’ve received from Robert Rankin and his scientists at the UofA and others.
  4. Cybera Booth and Reception. Cybera had a booth in the AGU exhibit hall to showcase the CANARIE Network Enabled Platform projects that they are running, of which CESWP is one. Other projects include CSSDP, GeoChronos and GeoCENS. Rick Clark (Chief Architect, OpenStack) was able to attend part of the VO meeting and the reception, so our developers took full advantage of the opportunity to talk to Rick about OpenStack and where open source cloud middleware is going. The Cybera booth had videos of the various projects looping on a display throughout the conference, and everyone took turns manning the booth and telling people about their projects.
  5. Various informal meetings with project stakeholders. Throughout the week we were able to meet with scientists and help them set up their environments in the CESWP cloud. We also met up with Professor Zong and Yongfu Wang, one of Professor Zong’s graduate students, about the CESWP cloud zone at Peking University. The hardware is all in place in Beijing, the cloud middleware is installed, and we just have some debugging to do before we hook it up to the CESWP cloud. We talked about future plans with Peking U, including attending a workshop in Gansu, China, in May, 2011. [Professor Zong graciously hosted Robert Rankin and John Shillington at Peking University in mid-October to meet with scientists and to get the groundwork completed to set up the cloud zone there. More about that to follow in a future post.]

Overall it’s been a great week–and a great year–for the project. We are excited, re-energized, and ready to move the project forward so scientists can start to reap the benefits.

OpenStack

While the CESWP cloud is currently based on Eucalyptus, the emerging field of cloud computing platforms continues to grow and evolve quickly.  With that in mind we have been keeping an eye on new developments and new projects in this space.  Recently the OpenStack project came to our attention.

OpenStack is a collection of open source technology products delivering a scalable, secure, standards-based cloud computing software solution.  They’re currently developing two interrelated technologies: OpenStack Compute (aka Nova) and OpenStack Object Storage (aka Swift).  Compute is the internal fabric of the cloud creating and managing large groups of virtual machines.  Object Storage is software for creating redundant, scalable object storage using clusters of commodity servers to store terabytes or even petabytes of data.  While OpenStack is pushing forward with their own API they’re also maintaining compatibility with the EC2 API.

OpenStack is backed by a large global software community of technologists, developers, researchers and corporations.  At the moment the primary participants are Rackspace, NASA and Citrix.  They (along with others) are sharing resources, technology, and ideas to create a massively scalable, secure open source cloud infrastructure software package. They propose that with this open technology, any organization can create and offer cloud computing services running on standard hardware.

To get a better feel for OpenStack and to get some insight on the inner workings of the project we attended their Design Summit.  It was a fascinating look into how a well-funded open source project could be run.  The focus of the summit was the development of the platform itself.  Most of the sessions consisted of the OpenStack developers sitting in concentric circles (the fishbowl format) discussing the blueprints to be added to the next release of OpenStack (aka Bexar).

Besides the developer sessions there were business sessions, a documentation sprint and the install fest.  The business sessions were geared towards the use cases for the cloud, the experiences people have had deploying OpenStack and the governance of the project.  The documentation sprint was a core group of brave souls willing to contribute documentation for all aspects of OpenStack.  Finally, the install fest was a chance to get some assistance from the developers while installing OpenStack Compute and Object Storage.

Here are some highlights from the summit.

Day 1

  • Mark Collier, the VP of Marketing for OpenStack (employed by Rackspace), had some very interesting things to say:
    • OpenStack will be ready for production with the January (Bexar) release or the subsequent release 6 months later
    • Rackspace is betting the farm on OpenStack, once they put it into production mid-next year
    • Rackspace will be offering support for OpenStack, although they don’t have anything formal in place yet
  • OpenStack will support IPv6 in one of next year’s releases
  • Support for bursting, enabling a hybrid cloud
  • There can be multiple clusters in a region
  • A web based cloud management system will be integrated
  • Apps for iOS (iPhone, iPad, iTouch)

Day 2

  • There was a great discussion on ways to improve the community around OpenStack
    • On that note, we suggested that if OpenStack must use mailing lists then integration with a service like Nabble would help visibility
    • Preferably, we suggested that for Q&A style forums that StackExchange is a better model than old-school discussion forums
    • If piggy backing on StackExchange wasn’t an option then OSQA: The Open Source Q&A System may be more palatable
  • Development of a deployment tool for OpenStack Compute (Nova)
  • The OpenStack logo and trademark are owned by OpenStack LLC, a subsidiary of Rackspace

Day 3

  • Support for live migration of VMs, starting with KVM in Ubuntu 10.04
  • Some of the developers from Eucalyptus were in attendance at the summit too.  We took the opportunity to corner one of them and get some assistance with some Eucalyptus problems we’ve been having and give some feedback.
    • We suggested the same improvements to their forums using tools like Nabble, StackExchange or OSQA
    • We discussed improvements to the Eucalyptus logs
    • We talked about how we sometimes have trouble recovering VMs when we restart Eucalyptus and learned that the time between restarting the cloud will affect what you lose because services give up on waiting and decide the component is gone for good
    • We asked for more transparency into the project using tools like blogs, blueprints and roadmaps

Day 4

  • We gave a lightning talk in front of the OpenStack developers about CESWP.  We stressed the point that the CESWP cloud is not even close to the scale of Rackspace or NASA and asked that the OpenStack developers should continue to give consideration to projects of our size as well.
  • Installed OpenStack Object Storage (Swift) development environment using the Swift All In One instructions on our sandbox cloud
  • Installed OpenStack Compute (Nova) development environment using this script (see the long form instructions) on our sandbox cloud
  • There wasn’t much time remaining to play around with Swift and Nova but we did confirm that we could put/get objects and start/stop VMs

It was an interesting and extremely educational 4 days in San Antonio, TX.  Thanks to everyone who made the event possible!