|
|
CESWP is hosting a session at the 2010 AGU Fall Meeting this December together with Robert Rankin (University of Alberta), Todd King (UCLA), and Bob Weigel (George Mason University). Last year the Fall AGU Meeting drew over 16,000 attendees, so it will be an excellent opportunity to meet other researchers, discover new ideas, and share insights. The title of our session (IN21) is Research Clouds: Virtualization of Infrastructure, Tools and Services. It’s part of the Earth and Space Science Informatics stream:
Infrastructure as a Service has led to the advent of private and hybrid clouds, which allow low-cost entry to developing Virtual Appliances that support collaboration, processing data, and running application codes. Benefits of virtualization include access to potentially unlimited resources through the use of Virtual Machines on, e.g., Amazon Web Services. On-demand resources avoid up-front infrastructure investments as users incur cost only when they use cloud resources. This session seeks submissions that demonstrate uses of cloud computing in data-intensive applications. The session provides a forum to share experiences in use of clouds, and to identify types of usage that can be shared across disciplines.
We have a great lineup of invited speakers for the session:
It should be a really interesting session, and we’d like to encourage you, dear reader, to submit an abstract. The deadline is 2 September at 23:59 EDT/3:59+1 GMT, and it’s a hard deadline–miss it by a minute (literally) and it won’t be accepted. So better get writing now!

Einstein in front of the National Science Foundation building in Washington, DC.
Last week a representative from CESWP attended the Earth and Space Science Informatics (ESSI) conference. ESSI is a focus group of the American Geophysical Union that serves to facilitate communications and coordinate activities related to issues of data management and analysis, large-scale computational experimentation and modeling, and the hardware and software infrastructure needs to span the range of scientific topics of interest to the Union. The conference was held at George Mason University just outside of Washington, DC and it focused on 7 topics:
- Software Reuse/Open Development
- Provenance
- Visualization
- Data Mining, Machine Learning and Data Analysis
- Semantic Web
- Community Organization and Governance
- Informatics in Education
During the Software Reuse/Open Development topic CESWP gave a presentation on clouds, the CESWP cloud in particular and how clouds could be used to store SPASE data. We also discovered myexperiment.org, a website that makes it easy to find, use and share scientific workflows and other Research Objects, and to build communities. Something that could potentially be of use to our sister project CSSDP. While discussing software reuse the conversation veered towards the annual AGU conference and an important point was brought up:
The informatics sessions at the AGU are not attended by the scientists we want to reach but by other IT people.
The importance of this statement was echoed by all in attendance. The question was how to bring the scientists we’re trying to reach into the informatics sessions. One idea that was floated around was to join other, more science oriented sessions and take the last 10-20% of the time to talk about how it relates to informatics and point people at the appropriate informatics sessions.
The topic of Provenance was discussed and one of the points made that hit close to home for CESWP was what to do about the use of web services when doing provenance. How do you reproduce a call to a web service if that web service no longer exists or is now a different version? Is it even possible to tell if it’s a new version of that particular web service? What if you just get subtly different results? There were no clear answers but these are issues we’ll have to consider.
Visualization is a very active field and there are lots of tools that can be used such as CISM_DX, SciPy, pylab, SpacePy, autoplot, Algorithms for analysis of magnetic fields in CISM-DX and CCMC. It was also noted that visualization is shifting away from a multi-location process, i.e. one where you compute remotely and then move the data to a local device to do the visualization. Instead, it is now possible to compute and visualize in the same location.
Having a chance to talk directly to the developers from CCMC was valuable as CESWP must also run simulations as one of our use cases. It was interesting to learn that the web UI to their simulations are hand coded. When it was mentioned that we were considering generating a web UI using a file also written by the author of the simulation, the CCMC developers said they were considering doing the same thing. The author of a simulation would create a file (in a particular format) which would specify all of the parameters of the simulation and their valid ranges, then CESWP would take that file and turn it into a web UI for anyone to use to run a simulation.
The remainder of the conference was interesting and informative but of less relevance to CESWP so to keep a long post short we’ll leave the notes for those topics aside. All in all it was a worthwhile conference in a very hot and humid Washington, DC.
On many levels, the CESWP project is about collaboration, and the last few weeks have exemplified that:
- collaboration with the CSSDP team to help them establish a new (virtualized) test environment;
- collaboration with scientists to help get their models running in the cloud;
- collaboration with various systems groups as we plan to build out the cloud;
- collaboration with other CANARIE NEP projects to share common ideas and infrastructure;
- building a prototype of the “Collaborate in the Cloud” (aka “Clare’s Collaboration”) use case.
Here’s a bit more detail on each of these activities:
- The CSSDP team has been using a small server in their office for testing CSSDP, but it has become painfully slow. After consulting with the team, we have built a virtualized test environment for them that runs on one of the HP Labs servers that the University of Calgary has graciously lent us. In August, we will help the CSSDP team move their test environment into the CESWP cloud.
- We are actively engaged with a number of scientists as we start to move Space Weather models into the cloud. For example, Andre Susanto, Hans De Sterck’s student, has managed to get a simplified version of his MHD code running on 16 processors in the cloud with a 10-times speedup. We have also been working with Robert Rankin and Konstantin Kabin (University of Alberta) to move the Space Weather Modelling Framework (SWMF) into the cloud, and Aaron Ridley and Darren De Zeeuw (CSEM, University of Michigan) have started to move GITM2 and SWMF into the cloud.
- We have been working more closely with Luke Tymowski, part of P.T. Jayachandran’s lab at the University of New Brunswick, as we plan for the UNB CESWP Availability Zone this fall. Luke has been our proxy at UNB and is working out all the logistical details on the ground. Luke has also proved to be a great source of interesting ideas and good common sense. Our project is much richer as a result of his participation. Meanwhile, we continue to work with Cybera, the University of Calgary, the University of Alberta, the University of Waterloo and SHARCNet, as well as UCLA and Peking University, to prepare for the roll-out of the various cloud zones.
- Russ Taylor (ISIS – Radio Astronomy, Physics, University of Calgary) and Cam Kiddle (Grid Research Centre at the University of Calgary) had the good sense to spot commonalities among their CANARIE NEP2 project–CyberSKA (Square Kilometer Array)–and three other projects: CSSDP, CESWP and CANFAR, a NEP1 project at the University of Victoria. Russ and Cam hosted members of our team, as well as members of the CANFAR and CyberSKA teams, for a workshop at the UofC on Monday, June 28. The meetings were very productive, and we now have members of the CANFAR team using the CESWP cloud to prove the cross-cloud compatibility of their work. In return, there are a number of things–ideas, experiences, and even source code–that the CANFAR team has offered to share with us. It is still early days for the CyberSKA project, but we fully expect to continue our contact with them as well and to collaborate as our projects proceed.
- Finally, we have been building a prototype of the “Collaborate in the Cloud” use case. We now have a prototype that leads a user through the basic decisions necessary to instantiate a Virtual Machine that can be used by a group of collaborating scientists to review, analyze and visualize the results of simulation runs.
CESWP is about building software and cyberinfrastructure, but it is also about building relationships and collaborating with a wide range of people across disciplines, regions, and countries. And so we collectively forge on.
We had our second Virtual Organization (VO) meeting on Wednesday, June 16, 2010. Slides from the meeting can be found here. The meeting had three main objectives: update the VO on our progress since the last meeting; review the types of uses we have planned for the cloud; and discuss our approach to managing cloud resources.
We have been making steady progress since our last meeting in January, and have kept on track with the project milestones. Specifically, we completed Milestone 1 (virtualizing CSSDP) on time and on budget; we are half-way through Milestone 2 (initial cloud cluster), as scheduled, and have already established a first iteration of the University of Alberta cluster; and we have started Milestone 3 (initial simulations in the cloud), also on schedule, and already have four use cases in progress.
The initial version of the CESWP cloud consists of a Eucalyptus Cloud Controller (CLC), Walrus (like Amazon’s S3), a Cluster Controller (CC), a Storage Controller (SC), and two Node Controllers (NC). The NCs are ultimately where the VMs will run, under control of the CC. Requests to start VMs are given to the CLC, which passes the requests on to the CC to execute. The CC then determines which NCs have the capacity to instantiate the VMs and then passes the requests on to the appropriate NCs. We have also created a second cluster (also known as an “Availability Zone”, or “Zone”) at the University of Calgary, controlled by the CLC at the UofA. The UofC zone is temporary and for experimental purposes only, but it has allowed us to prove that we can construct a cloud with zones located in multiple regions.
To make use of the cloud, we have four initiatives underway. First, we are working with Clare Watt, one of our Science Team members, to build a set of tools to make it easy for scientists to collaborate in the cloud. Second, Andree Susanto, one of Hans De Sterck‘s PhD candidates at Waterloo, is moving an MHD (Magnetohydrodynamic) code called CFFC into the cloud. He is following a similar path to the one we envisioned for GITM (which has been pushed back in the schedule but will still be done): get a 2D version of the model working on a single VM without MPI, then try to get a version working on multiple VMs with MPI. At the time of the VO meeting, Andree had managed to get a 2D version running on one 4-core VM. Our third initiative is to work with Konstantin Kabin, another one of our Science Team members, to get his Space Weather Modelling Framework (SWMF) running in the cloud. SWMF is an MPI-based MHD code, so we hope to use our experience with Andree to speed up the process with Konstantin. Ultimately our goal is to extract the common tasks and make it as easy as possible for scientists to move their models into the cloud and run them. Our fourth initiative is to get the Magneto-seismology service running in the CESWP cloud (in December we set it up for demonstration purposes in AWS). Subsequent to the VO meeting, we have talked with Robert Rankin and the CSSDP team and have decided to also use the Magneto-seismology code as an example workflow to exercise the CSSDP workflow engine.
The final topic for the VO meeting was how to manage cloud resources. An initial discussion of how cloud resources can be managed can be found in our previous blog post (and comment). It was agreed that we would have a follow-up discussion of this subject with Rob Simmonds and Hans De Sterck.
Since the VO meeting we have been in touch with Aaron Ridley. Aaron’s GITM model was originally planned to be our first target for the cloud. In the VO meeting Robert Rankin affirmed the importance of getting GITM into the cloud, so we will work with Aaron to do so as soon as we complete our current work. We hope to take the lessons learned from Andree and Konstantin’s work to make the process as smooth and fast as possible for Aaron.
…or “The Economics of a Research Cloud”
We are in the process of building a cloud that will provide on-demand computing resources for the Space Physics community. The key idea is that the resources are available on-demand. In a High Performance Computing (HPC) environment, it is typical for researchers to submit a batch job that will then be queued for hours, days or sometimes even weeks before it is executed. In the CESWP cloud, the goal is for researchers to have access to computing resources (typically in the form of Virtual Machines, or VMs) immediately when they request them. With limited resources, how will this ever work? Won’t all the resources quickly be consumed and held by eager researchers? This is the challenge that we face: if the CESWP cloud is popular, the resources will be consumed and it will become unpopular (“Oh, that &@!% thing. Every time I try to use it, it tells me that there are no VMs available, so I quit trying.”) In other words, if we succeed, we fail.
What about Amazon Web Services (AWS), though? That seems to work. As far as a user of AWS is concerned, the Amazon cloud is effectively infinite: if I have a credit card, I can request as many VMs as I want and I will get them. Why can’t we be like AWS? There are two key things that allow AWS to work: first, Amazon does in fact have vast resources; second, users are charged for the resources that they consume. The former means that there is always spare capacity. The latter means that I will always limit my use of the resources to what I can afford, and I will release my resources back into the commons as soon as I can. Meanwhile, Amazon can use revenues to increase the capacity of the cloud as required.
Now let’s look at our research cloud: First, do we have vast resources? No, in fact we have very limited resources initially–perhaps only enough, some have said, for a “toy” implementation. Second, what incentive do scientists have to use resources parsimoniously? None, aside from the altruistic goal of leaving resources available for other scientists. But however altruistic some scientists may be, it’s virtually certain that the tragedy of the commons will strike as soon as the cloud gains any popularity. Both of the factors that allow AWS to succeed are stacked against CESWP.
Let’s step back for a moment and compare two views of capacity planning: the cloud model versus the HPC centre model. In the HPC centre model, the use of expensive computing resources is maximized by overbooking and queuing jobs (i.e. compute time is valued highly). What is sacrificed? The researcher’s time: he must plan his work around wait-time in the queue. In contrast, the cloud model offers the promise of immediate access (i.e. the scientist’s time is of highest value), but at the cost of relatively inefficient resource usage: if you want people to be able to have resources on-demand, there must be resources sitting around waiting to be used. Thus, in order for the cloud to work, you must operate at a fraction of your resource capacity.
If we succeed, the limited pool of resources in our cloud will quickly be used up, so scientists will have to wait. But unlike the HPC centre model, they won’t get placed in a queue, they’ll just have to try again later. It’s like trying to be the fifth caller on a radio call-in show. How can we solve this? One option is to add accounting to the CESWP cloud and charge researchers for using it, but that goes against the spirit of the project, not to mention the logistical and administrative difficulty of doing so. Another option is to throttle each scientist’s use of the cloud: for example, “you’re only allowed to have 2 VMs and 20GB of disk at any time, and you can only keep them for a maximum of two days.” This may be part of a solution, but it doesn’t provide the flexibility that we’d like (i.e. to either use a few resources for a longer time, or to use a lot of resources for a short time). A third option is to add a resource management system and have people queue up for resources. But if we’re going to do this, why even bother building the cloud? HPC centres have been doing this for a long time and will be able to do a much better job than we ever could. And it defeats the whole point of providing a cloud: making resources available on-demand.
Looking at the factors that make a cloud work (i.e. large capacity and parsimonious usage), we need a way to dynamically increase the capacity of the CESWP cloud when necessary, and we also need a way to make users really care about sharing common resources. One possible solution is to provide users with limited resources within the CESWP cloud, and to then offer them the option to “cloudburst” into AWS if they want more resources and are willing to pay for them.
Here’s an example of how it might work: Dr. Gauss wants to run a small simulation that requires 64 VMs for ten minutes or so. He asks the CESWP cloud for the resources and is told that he can have 16 VMs now for free, and if he is willing to pay for the remaining 48 VMs, he can provide his credit card and have them now too. Dr. Gauss needs the results for a paper he has to submit tomorrow, so he accepts the offer. The CESWP cloud marshals the resources that Dr. Gauss has requested (16 VMs in CESWP, 48 VMs in AWS), runs the job, returns the results and releases the resources. At the end of the month, Dr. Gauss receives a bill from Amazon for his use of their services. The bill is 25% smaller than it would have been if he hadn’t been able to use CESWP, and he managed to get his paper submitted on time. Alternatively, Dr. Gauss may decide that he can run the simulation with only 16 VMs, in which case he carries on without paying anything.
Although a number of technical and logistical hurdles remain to be worked out, our current plan is to try this “throttle and burst” approach (i.e. set resource limits within the CESWP cloud and burst to the Amazon cloud if and when a user wants to do so). We hope that this will allow scientists to access the resources that they need, when they need them. In that case, if we succeed…we succeed!
Postscript: It is important to note that the CESWP cloud will not replace scientists’ need for HPC centres. They will still need HPC for many things, including running large scale simulations. Instead, we hope that the cloud will serve as a helpful adjunct to HPC by providing an environment where scientists can collaboratively develop models, access and run common utilities, and run small scale simulations on-demand, among other things.

The last two weeks have been exciting and productive on the cloud-infrastructure front. On Monday, May 3, Rich Wolski, CTO and founder of Eucalyptus Systems, visited the Computing Science Department at the University of Alberta as a Distinguished Lecture Speaker. Paul Lu hosted Rich and graciously arranged for us to spend some time with him while he was on campus. We had a chance to tell Rich a bit about the CESWP project and our plans for the cloud, and he was able to give us some advice on the feasibility of various approaches (e.g. single cloud with many regional zones vs. multiple regional clouds). He also told us about some of the things that will be coming up in the Open Source version of Eucalyptus over the next year. All in all, it was a real privilege to spend time with him and get the information straight from the source.
Then on May 11-13 we assembled a group of twelve of our collaborators from far and wide (Edmonton, Calgary, Waterloo, Fredericton, Los Angeles) and received three days of in-depth Eucalyptus Training from Tim Gerla of Eucalyptus Systems. Our goals for the training were simple and concrete:
- Learn how to set up a multi-cluster, multi-region cloud.
- Learn Eucalyptus best-practices and how how to troubleshoot problems when they arise.
- Pull together representatives from the regions where CESWP will eventually have a cloud Availability Zone and establish a working rapport and common basic knowledge of Eucalyptus.
All of the goals were met, although perhaps not exactly in the way that we had first imagined. During the training we were able to set up an experimental Cloud Controller at the University of Alberta with a single local Availability Zone, and an experimental remote Availability Zone at the University of Calgary. Participants were then able to create Virtual Machines in each of the Availability Zones (i.e. on both the UofC cluster and on the UofA cluster). The process of setting up the cloud was simple in principle but required some tricks that occasionally proved challenging even for the Eucalyptus staff to figure out (Tim had good back-up support from the Eucalyptus engineers in Santa Barbara whenever it was needed). But in some ways it was fortunate that everything wasn’t textbook-smooth: it allowed us to see how the Eucalyptus engineers went about troubleshooting problems, and gave us some good insight into how we could do so ourselves as we proceed.
One interesting observation we made was that the creation of new machine images which can be used across Availability Zones is considerably more difficult than we had initially imagined. This isn’t specific to Eucalyptus; it holds true for all virtualization environments, and it is a good thing for us to know now. In practice, we will need to carefully structure, test and limit the machine images that are on offer to users of the CESWP cloud in order to make it feasible to support. Without controlling the images, it is possible for incompatibilities between images and regional clusters to creep in, resulting in a management nightmare and frustration for users.

Our third goal (i.e.assembling our collaborators and establishing a baseline knowledge and working rapport) was perhaps one of the most valuable outcomes of the training. We were very fortunate to have attendees with a broad range of experience and interests, all of whom combined to provide a really interesting and dynamic group:
- Todd King, UCLA. Todd works with Ray Walker, one of our Virtual Organization (VO) members. They will be hosting our Availability Zone in the US.
- Chen Zhang, University of Waterloo. Chen is doing his PhD with Hans De Sterck, another one of our VO members. We will have an Availability Zone at UWaterloo.
- Luke Tymowski, University of New Brunswick. Luke is a Systems Analyst with Dr. P.T. Jayachandran, the PI for the CHAIN (Canadian High Arctic Ionospheric Network) and a CSSDP VO member (our sister project). UNB will be our other Availability Zone in Canada.
- Tingxi Tan is a member of the Grid Research Centre (GRC) in Calgary, led by Rob Simmonds, who is on the CESWP VO. Tingxi has deep knowledge of virtualization and cloud computing, and did some early research into Eucalyptus. He is working on GeoChronos, another CANARIE Network Enabled Platform project. We hope to be able to leverage the GRC’s knowledge and experience as we move forward, and to share what we learn with them.
- Judy Yang is a virtualization expert with AICT at the UofA. Judy is assisting us in repurposing the CSSDP hardware for the cloud.AICT has a long working relationship with our PI, Robert Rankin, and is the lead contractor on our sister project, CSSDP.
- Cam MacDonell, Jeremy Nickurak, and Adam Wolfe Gordon are researchers in Paul Lu’s software systems research group in the Computing Science Department at the UofA. They are experts in virtualization and networking. With Paul’s generous cooperation, we have regularly benefitted from their knowledge and experience.
- Long Li is a System Administrator with Cybera Inc. Long is helping us and the CSSDP team with the ongoing operation and systems support of the virtualized version of CSSDP. Long and the Cybera Network and Operations team, led by Patrick Mann, CTO, will play a very important role in the implementation of the CESWP cloud and its ongoing operation.
- Last but not least, Barton Satchwill, Everett Toews, and myself (John Shillington) form the core of the CESWP project team. We are all employees of Cybera Inc.
In between the training sessions we were able to discuss the logistics of the CESWP cloud with our collaborators and get to know them a little better. We are now poised to start implementing the CESWP Cloud infrastructure, and will start to do so over the coming weeks by setting up the Cloud Controller and our first Availability Zone at the University of Alberta.
The title of this post is taken from the whiteboard in our project office. We put it on the board as a way to step back and focus on the critical (and concrete) steps that we need to take over the coming weeks. Here is the list, with annotations:
1. Build prototypes with scientists.
Technically, we aren’t scheduled to start work on simulations in the cloud until Milestone 3 (commencing June 1), but we want to start building some prototypes as soon as possible to get feedback from the scientists and make sure we build the right thing. We are starting by prototyping ‘Clare’s collaboration’ (see below), and later will move on to the other scenarios.
a) Clare’s collaboration
This is the “collaboration in the cloud” scenario that we have talked to several scientists about. It is named for Clare Watt, a Space Physics Research Associate at the University of Alberta and a member of our Science Team and Virtual Organization. Our prototypes and initial implementation will be used by Clare for collaborative research she is doing with scientists in Europe.
b) Canned simulations (e.g. refinements of Magneto-seismology demonstration)
In December, 2009, we set up a demonstration of how simulations might be run in the cloud. We used Amazon Web Services as the cloud, and used Konstan Kabin’s Magneto-seismology simulation as the canned simulation. For this scenario, we will pick a small number of models that are appropriate to run in the CESWP cloud and refine our prototypes of how to use them.
c) Model development
This is similar to the ‘Clare’s collaboration’ scenario, but instead of using the cloud as a place to share results of simulation runs, the cloud is used to actually develop and evolve space physics models.
d) Workbench and toolkit (e.g. TDAS, SWMF)
In this scenario, a Virtual Machine is instantiated with a set of tools for doing space physics research. Examples are the THEMIS Data Analysis Software suite or the Space Weather Modeling Framework. But rather than downloading the tools to a user’s local machine and configuring them locally, a Virtual Machine is instantiated in the cloud with the appropriate tools and environment installed, pre-configured and ready to run.
2. Move into the bullpen.
‘The bullpen’ is the open area office where most of the Science Team reside. This item is simple and very concrete: the CESWP development team now have two desks in the bullpen and we have started to spend part of our time working in close physical proximity to the scientists. Doing so increases our interaction with them and is another way to increase our chances of getting things right–or discovering our mistakes–early.
3. Build cloud (sandbox at UofC; real).
We now have the hardware in place for three small ‘sandbox’ clouds: one at the University of Calgary (3 machines); one at the University of Alberta (2 machines); and one at the University of Guelph (2 machines). We are using these sandboxes for planning and experimentation in preparation for deploying the ‘real’ CESWP cloud. The core hardware for the real cloud should be available (from the CSSDP project) by the beginning of June. Eucalyptus Systems will provide training for the CESWP team and our remote system administrators in mid-May.
4. Regular contact with VO members. Show them stuff via 1 and 5.
Our goal here is to ensure that we make effective use of the CESWP Virtual Organization. We have decided not to have frequent VO meetings, which are hard to schedule and are not necessarily the most effect way of eliciting input from VO members. Instead, we will make an effort to have regular contact with VO members on a one-on-one basis and show them (and let them try out) our prototypes as they progress. This is another way to try to get early and regular feedback.
5. Use the environment to build the environment (“Eat our own dogfood.”)
This is in fact something that we have been practicing since the beginning of the project, but it is an important reminder to continue to do so. As a simple example, this blog is running on a VM in the cloud. A more complex example is CSSDP, our sister project, which now runs as a series of VMs on a single machine (it used to require 3 physical machines). As we develop the prototypes, we will use our cloud to develop and run them.
6. Regular contact with CSSDP.
Although we have worked in parallel with the Canadian Space Science Data Portal (CSSDP) project since the start of CESWP, there are opportunities to improve our collaborative work. In particular, we would like to find ways to facilitate scientists’ use of the cloud through the CSSDP collaboration wiki (Confluence) and by seeking opportunities to support or use CSSDP’s Workflow facilities in the cloud.
We have already leaped into the breach and are making good progress on all six points.
There is nothing easy about understanding space weather: the area of space is vast, covering a volume that begins 10 Earth radii toward the Sun and is stretched out beyond the orbit of the Moon by the solar wind; the physical scales of the phenomena are very different, as are the temporal scales; and with the exception of ground-based observatories, the only data we have is from a handful of spacecraft travelling their distant orbits. The amount of data we are able to gather is so very small compared to the vast areas and distances involved. By analogy, it’s like trying to understand the entire ocean, complete with all it’s tides and currents, given just a few drops of water.
Last week, we were privileged to hear some of the best thinking about an especially difficult aspect of space weather being discussed at the 10th International Conference on Substorms.
This was a rich opportunity for us to meet with researchers and modellers from around the world, and test our understanding of how computational modelling and data analysis come together to move the science forward. We spent much of our time discussing how physics-based models provide a context for understanding the observational data, how that data is used to anchor the computational models in reality, and how statistical analysis of data can reveal the sequence of events and relationships that need further explanation. Theory, modelling, and observation are each used to advance the understanding of the others.
It was encouraging to find that almost everything we learned at this conference echoed and amplified what we have already learned from working with the research team at the University of Alberta. This gives us confidence that our understanding of what is needed in a useful system is reasonably complete, and that solutions built for our team will be useful for others.
We also had the opportunity to learn more about the excellent work done by the THEMIS project to develop a suite of tools to share and analyse the data gathered by these doughty satellites. The tools are powerful, yet easy to use, and the entire system is designed in such a way that it can be extended to work with the data products from other missions. The system includes thorough documentation, test suites, and a developer’s guide is in progress. This highly evolved system representing man-years of effort is being offered freely for use by other missions. This project creates a world where spacecraft and ground observatories engaged in complimentary science might gather, distribute, and analyse their data in a common framework.
In addition to the above, it looks like we have found a home for the American node of our cloud. Ray Walker has kindly offered us sufficient hardware and support for us to add a powerful node to our cloud. Ray and his team have looked at cloud computing with interest in the past and, like us, they are wondering how it can be used. We hope that we will have the opportunity to reciprocate Ray’s generosity by sharing whatever we learn with him.

With Milestone 1 behind us, we are now looking ahead to the rest of the project. Rather than diving straight into Milestone 2 (i.e. building the first nodes of the CESWP cloud at the University of Alberta), we are taking a step back and looking at the remaining milestones and identifying what we need to put in place now. There are two main threads that run through the remaining milestones:
- Build and extend the CESWP cloud. This includes Milestones 2, 4 and 6. In Milestone 2, we establish the initial cloud nodes at the University of Alberta. In Milestone 4, we extend the cloud to include nodes at the University of Waterloo and at the University of New Brunswick. In Milestone 6, we include a node in the US and a node in China.
- Use the CESWP cloud for Space Weather modelling and simulation. Milestones 3, 5 and 7 are about allowing scientists to use the CESWP cloud to develop and run Space Weather models. Milestone 3 uses the initial cloud nodes, Milestone 5 uses the geographically distributed cloud nodes built during Milestone 4, and Milestone 7 experiments with simulations using Hadoop (an open source version of Google’s MapReduce).
Since completing Milestone 1, we have taken the following steps:
- Planned out the key inch-pebbles for Milestone 2.
- Met with researchers at the IBM Watson Research Center at Yorktown Heights, NY to discuss ideas around an HPC (High Performance Computing) Cloud (among other things).
- Met with Ray Walker (CESWP VO Member), Todd King, and Steven Joy at UCLA to discuss cloud simulation candidates and the possibility of hosting a CESWP cloud node at UCLA.
- Met with members of the THEMIS Data Analysis Software (TDAS) team, the ERG team, and the ORBITALS team at UCLA to discuss (among other things) the possibility of virtualizing TDAS software and making it available as a virtual appliance (VA) in the CESWP cloud.
- Met with Space Physicists at the International Conference on Substorms (ICS) to identify and confirm CESWP user stories.
- Initiated discussions with WestGrid at the University of Alberta and with SHARCNET at the University of Waterloo regarding the CESWP cloud and possible opportunities for cooperation.
- Planned and ordered the hardware for a project sandbox cloud that will allow us to experiment with different cloud configurations throughout the course of the project without impacting production systems. At the end of the project, we will move the sandbox hardware into the CESWP cloud.
We will discuss each of these threads over the coming weeks as they develop.
The first milestone of the CESWP project, to virtualize the Canadian Space Science Data Portal (CSSDP), was completed on Thursday, February 25, 2010, three days ahead of schedule. The transition was relatively painless. When you visit the (new) CSSDP site you are visiting the virtualized version. Our goal was to make the move completely transparent from a user’s point of view: in other words, if a user doesn’t notice the change, we’ve succeeded.
In a previous post we wrote about what is involved in virtualizing CSSDP. Here is the basic idea: take CSSDP, which was running on three physical machines, and move it to three virtual machines (VMs). Now that they are running on VMs, we can decide where and how we want to deploy them in the future. To start with, we deployed them on a single server whose capacity is approximately equal to the combined capacity of the three physical servers they were running on before. If for some reason–performance, for example–we want to move them to a different machine or to separate machines, the process will be quite simple: take a snapshot of each VM, move the VMs to their new home, and start them up. Naturally there will be other details to look after, but that is the essence of it. Having carried out the process several times now, we are confident that it is indeed relatively simple to do.
As well as virtualizing CSSDP, we changed its physical location. Prior to virtualization, CSSDP ran at the University of Alberta under the care of AICT (the UofA systems group). The virtualized version is now housed in Cybera’s Calgary offices. The most obvious step in the move was to identify any location-dependent aspects of the installation (e.g. IP addresses, domain names, software licenses) and reset them for the new location. Doing so was relatively painless, but it did expose a few places where hidden (or undocumented) assumptions were made about the location of services, and we hunted down and corrected them.
There were two additional aspects of the move that required more attention than we expected. One was ensuring that all system administration procedures are properly replicated in the new environment. This is largely an administrative exercise, but it’s very important nonetheless. The other was a surprise: we expected that moving the new (physical) server from our staging ground in Edmonton to its home in Calgary would take about an hour (aside from travel time). But we were prudent and decided to schedule the move two weeks in advance. It was fortunate that we did, because it ended up taking about three days to get everything working properly in the new location. The problem ended up being an obscure compatibility problem between an older network switch and the new Intel chip set on the server network card. The good news is that the problem wasn’t unique to our server and, once identified, was straightforward to resolve.
Our next milestone is to build a multi-node cloud at one physical site. We have already built an experimental cloud with a single node, so we understand the mechanics of the job. It is now a matter of acquiring the hardware and putting the real thing in place. Once the CESWP cloud is built, we should be able to move CSSDP with very little additional work. We hope that doing so will remain transparent to CSSDP users, just as the virtualization of CSSDP was.
|
|