View on GitHub

HCP Public Pages

Resources for the Human Connectome Projects

The goal of this tutorial is for the reader to gain experience with running the HCP pipelines in the “Amazon Cloud”.

Table of Contents

Terms and Acronyms

AWS - Amazon Web Services
EC2 – Elastic Compute Cloud
S3 – Simple Storage Service
S3 Bucket
AMI – Amazon Machine Image – The Software
Amazon EC2 Instance Types – The available hardware
Amazon EBS – Elastic Block Storage    
NITRC

Step 1: Getting Credentials to access HCP S3 Data

Step 2: Getting Started with AWS

Step 2a: Login to AWS
Step 2b: Create an Instance
Step 2c: Configure Your Machine Instance
Step 2d: Connect to Your Running Machine Instance
Step 2e: Make a Terminal Connection using SSH 

Step 3: Take Note of the Pre-installed Software

Step 3a: Note FSL Installation
Step 3b: Note FreeSurfer Installation
Step 3c: Note Connectome Workbench Installation
Step 3d: Note the HCP Pipelines Installation
Step 3e: Note All Available Pre-installed Software

Step 4: Take Note of Available HCP data

Step 5: Create directory structure on which HCP Pipelines can be run

Step 6: Editing files to run a pipeline stage

Step 7: Starting up a set of PreFreeSurfer Pipeline jobs

Step 8: Shutdown and Restart of an instance

Step 8a: Shutdown of a running machine instance
Step 8b: Restart of a machine instance
Important Notes about Stopping and Restarting machine instances:

Step 9: Installing StarCluster

Step 10: Create an AWS Access Key Pair

Step 11: Setup a cluster for running HCP Pipelines

Step 11a: Supply StarCluster with your AWS credentials
Step 11b: Creating an Amazon EC2 key pair
Step 11c: Start an example cluster
Step 11d: Navigate your example cluster
    Example StarCluster Commands
Step 11e: Terminate your small cluster
Step 11f: Create an instance to use as a model for your pipeline cluster nodes
Step 11g: Further prepare your new instance for StarCluster use
    Turn off the software firewall on your PipelineNodeTemplate instance
    Delete gridengine software from your PipelineNodeTemplate instance
    Delete the sgeadmin account and group.
    Remove the SGE_ROOT setting in the /etc/profile file
Step 11h: Install SGE files
    Create a compressed tar file containing what StarCluster needs
    Copy the compressed tar file you just made to your local machine
    Copy the compressed tar file from your local machine to your PipelineNodeTemplate instance
    Unpack the compressed tar file and copy its contents to where StarCluster expects it
    Terminate the instance you just created based on the StarCluster AMI
Step 11i: Create an EBS volume to hold data to be shared across your cluster
Step 11j: Create an AMI for cluster nodes
Step 11k: Configure and Start a Pipeline Cluster 

Step 12: Getting the HCP OpenAccess data available to your cluster

Step 12a: Setting up s3cmd on your master node
Step 12b: Retrieving data to process from the HCP OpenAccess S3 Bucket

Step 13: Editing files to run a pipeline stage

Step 14: Starting up a set of PreFreeSurfer Pipeline jobs

Step 15: Using the StarCluster load balancer

Step 16: Using spot instances as worker nodes

Links and references

Terms and Acronyms

The goal of this tutorial is for the reader to gain experience with running the HCP pipelines in the “Amazon Cloud”. In order for this to make sense, it is important that you start out with a basic understanding of the following terms.

AWS - Amazon Web Services

A collection of remote computing services that make up a cloud computing platform. The two of the central services are Amazon EC2 (the service that provides compute power, “machines” that are remotely available) and Amazon S3 (the service that provides storage space for your data).

EC2 – Elastic Compute Cloud

Amazon service that allows users to rent virtual machines (VMs) on which to run their applications. Users can create, launch, and terminate VMs as needed, paying an hourly fee only for VMs that are currently active (this the “elastic” nature).

S3 – Simple Storage Service

Amazon online data storage service. Not a traditional file system. Stores large “objects” instead of files. These objects are accessible virtually anywhere on the web. Multiple running EC2 instances can access an S3 object simultaneously. Intended for large, shared pools of data. Conceptually similar to a shared, web-accessible drive.

S3 Bucket

Data in S3 is stored in buckets. Forour purposes, a bucket is simply a named container for the files that we store and share via Amazon S3. HCP’s data is made available publicly in a bucket named hcp-openaccess.

AMI – Amazon Machine Image – The Software

A read-only image of a file system that includes an Operating System (OS) and additional software installed. Conceptually, this is comparable to a CD/DVD that contains an OS and other software that is installed on a “machine” for you. The creator of the AMI chooses which OS to include and then installs and configures other software. For example, an AMI creator might choose to start with CentOS Linux or Ubuntu Linux and then pre-install a set of tools that are useful for a particular purpose.

An AMI might be created for Photo Editing which would contain a pre-installed suite of software that the AMI creator deems is useful for Photo Editing.

An AMI might be created for Neuroimaging with a chosen OS (e.g. Ubuntu 12.04.1 LTS) and a pre-installed suite of software for Neuroimaging (e.g. FSL, AFNI, FreeSurfer, the HCP Pipelines, Workbench, etc.)

The AMI is the software distribution that will be installed and run on your virtual machine instance (see below.)

Amazon EC2 Instance Types – The available hardware

An EC2 Instance Type is a particular combination of CPU, memory (RAM), storage, and networking capacity optimized for a particular purpose. There are instance types defined for use as:

An Instance Type is a virtual hardware configuration.

Amazon EBS – Elastic Block Storage

Online data storage service that provides a more traditional file system. An EBS volume is attached to a running EC2 instance. From the EC2 instance’s point of view, an EBS volume is a “local drive”.

EBS volumes can be configured such that the data continues to exist after the EC2 instance is shut down. By default, however, they are configured such that the volume is deleted upon instance shut down.

NITRC

Neuroimaging Informatics Tools and Resources Clearinghouse

 


Return to Table of Contents


Step 1: Getting Credentials to access HCP S3 Data

 

 

 

 


Return to Table of Contents


Step 2: Getting Started with AWS

Step 2a: Login to AWS

 


Return to Table of Contents


Step 2b: Create an Instance

 

 

 

 

Important: 

  1. Different browsers may have different responses and messages associated with not make a valid secure connection.
  2. Choose whichever option allows you to continue to connect to your instance (e.g. Advanced, Proceed, Continue, Connect, Accept, etc.)

 

 

 


Return to Table of Contents


Step 2c: Configure Your Machine Instance

Important:

  1. The following is the FreeSurfer license that we are using for this course. It is only intended for your use during this course. If you want to continue using FreeSurfer after the course, please get your own FreeSurfer license and install it on any machine instances you use.
  2. In the FreeSurfer license information below, please carefully note that there are singel space characters before lines 3 and 4.

 

Since the course is over, this content has been redacted. Contact FreeSurfer to obtain a license for your use.


Return to Table of Contents


Step 2d: Connect to Your Running Machine Instance


Return to Table of Contents


Step 2e: Make a Terminal Connection using SSH

Important:

  1. If your private key file was downloaded to somewhere different than your ~/Downloads directory, you will need to substitute the location of your private key file for ~/Downloads in the below commands.
  2. If your private key file was named something other than MyHcpKeyPair.pem, you will need to substitute the name of your private key file for MyHcpKeyPair.pem in the below commands.
$ cd ~/Downloads
$ chmod 400 MyHcpKeyPair.pem

 

Important:

  1. You will need to substitute your machine instance’s public DNS for ec2-52-4-26-132.compute-1.amazonaws.com in the below commands.

 

$ ssh -X hcpuser@ec2-52-4-26-132.compute-1.amazonaws.com


Return to Table of Contents


Step 3: Take Note of the Pre-installed Software

Step 3a: Note FSL Installation

 

$ which fslview
/usr/share/fsl/5.0/bin/fslview
$ fslmerge

Usage: fslmerge <-x/y/z/t/a/tr> <output> <file1 file2 .......> 
 -t : concatenate images in time
 -x : concatenate images in the x direction
 -y : concatenate images in the y direction
 ...
 
$ flirt -version
FLIRT version 6.0

$ fsl

 


Return to Table of Contents


Step 3b: Note FreeSurfer Installation

$ which freesurfer
/usr/local/freesurfer/bin/freesurfer
$ freesurfer

FreeSurfer is a set of tools for analysis and visualization
of structural and functional brain imaging data. FreeSurfer
also refers to the structural imaging stream within the 
FreeSurfer suite.

Users should consult ...

 


Return to Table of Contents


Step 3c: Note Connectome Workbench Installation

$ wb_command -version
Connectome Workbench
Version: 1.0
Qt Compiled Version: 4.8.1
Qt Runtime Version: 4.8.1
commit: unknown (NeuroDebian build from source)
commit date: unknown
Compiler: c++ (/usr/bin)
Compiler Version:
Compiled Debug: NO
Operating System: Linux

$ wb_view


Return to Table of Contents


Step 3d: Note the HCP Pipelines Installation

 

$ cd ~/tools/Pipelines
$ ls
DiffusionPreprocessing  fMRISurface  FreeSurfer  LICENSE.md ..
...
$ more version.txt
V3.6.0-RCd

Return to Table of Contents


Step 3e: Note All Available Pre-installed Software


Return to Table of Contents


Step 4: Take Note of Available HCP data

$ cd /s3/hcp
$ ls

Return to Table of Contents


Step 5: Create directory structure on which HCP Pipelines can be run

 

$ cd ~/tools
$ wget https://github.com/Washington-University/access_hcp_data/archive/v3.0.0.tar.gz
...(output from wget)...
$ tar xvf v3.0.0.tar.gz
...(output from tar)...
$ ln -s access_hcp_data-3.0.0 access_hcp_data
$ cd

Important:

  1. The last command in the code block below should all be entered on one line (or wrapped only by the width of the terminal.) Do not press Enter until you’ve typed the entire command.
$ cd 
$ ./tools/access_hcp_data/link_hcp_data --source=/s3/hcp --dest=${HOME}/data --subjlist=${HOME}/tools/access_hcp_data/example_subject_list.txt --stage=unproc

Return to Table of Contents


Step 6: Editing files to run a pipeline stage

This step should be familiar to you as it is very similar to the modifications you made to the PreFreeSurferPipelineBatch.sh script and the SetUpHCPPipeline.sh script in a previous practical. The point is to make similar modifications to adapt these scripts to the configuration of your running EC2 instance.

Important:

  1. Be sure to take note of and include the .mine part in the file names for the files that are being created by the cp commands below.
$ cd ~/tools/Pipelines/Examples/Scripts
$ cp PreFreeSurferPipelineBatch.sh PreFreeSurferPipelineBatch.mine.sh
$ cp SetUpHCPPipeline.sh SetUpHCPPipeline.mine.sh

 

Important:

  1. Be sure to make sure that the setting of the EnvironmentScript variable includes .mine in the name of the set up file.
StudyFolder="${HOME}/data"
Subjlist="100307 111413"
EnvironmentScript="${HOME}/tools/Pipelines/Examples/Scripts/SetUpHCPPipeline.mine.sh"

 

Important:

  1. Commented out commands (starting with a #) which are very similar to the following are already in the setup script file. If you want to modify those commands instead of entering all new commands yourself, you will need to make sure you add the keyword export in the appropriate places, remove the comment marker (#) from the beginning of the appropriate lines, and carefully check that the values set for the FSLDIR, FREESURFER_HOME, HCPPIPEDIR, and CARET7 variables are as shown below.
  2. Be careful to note and check your placement of double quote characters (“) so they match what is shown below.
  3. Note that FREESURFER_HOME should not have bin in its set value. Before you edit this line, it may have bin in the variable setting. Be sure to remove this.
  4. Note that HCPPIPEDIR should not have projects in its set value. Before you edit this line, it may have projects in the variable setting. Be sure to replace this with tools.
  5. The setting for CARET7DIR is completely different from what it is before you edit it.
# Set up FSL (if not already done so in the running environment)
export FSLDIR="/usr/share/fsl/5.0"
. ${FSLDIR}/etc/fslconf/fsl.sh

# Set up FreeSurfer (if not already done so in the running environment)
export FREESURFER_HOME="/usr/local/freesurfer"
. ${FREESURFER_HOME}/SetUpFreeSurfer.sh > /dev/null 2>&1

# Set up specific environment variables for the HCP Pipeline
export HCPPIPEDIR="${HOME}/tools/Pipelines"
export CARET7DIR="/usr/bin"

Return to Table of Contents


Step 7: Starting up a set of PreFreeSurfer Pipeline jobs

Again, this step should be familiar as it is essentially the same as the test run you did of the PreFreeSurferPiplineBatch.mine.sh script in a previous practical.

Important:

  1. There are 2 (two) hyphens in from of runlocal.
$ cd ~/tools/Pipelines/Examples/Scripts
$ ./PreFreeSurferPipelineBatch.mine.sh --runlocal
This script must be SOURCED to correctly setup the environment prior to running any of the other HCP scripts contained here
--runlocal
100307
START: ACPCAlignment
Final FOV is:
0.000000 ...

 

$ cd ~/data/100307
$ ls
MNINonLinear  release-notes  T1w  T2w  unprocessed
$ cd T1w
$ ls -l
total 62808
drwxrwxr-x 2 hcpuser hcpuser 4096     May 8 17:08 ACPCAlignment
-r-xr-xr-x 1 hcpuser hcpuser 32150559 May 8 17:08 T1w1_gdc.nii.gz
-r-xr-xr-x 1 hcpuser hcpuser 32150559 May 8 17:08 T1w.nii.gz
drwxrwxr-x 2 hcpuser hcpuser 4096     May 8 17:08 xfms
 
$ cd ~/data/100307/unprocessed/3T/T1w_MPR1
$ ls -l
total 20
lrwxrwxrwx 1 hcpuser hcpuser 59 May 8 16:51 100307_3T_AFI.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_AFI.nii.gz
lrwxrwxrwx 1 hcpuser hcpuser 65 May 8 16:51 100307_3T_BIAS_32CH.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_BIAS_32CH.nii.gz
lrwxrwxrwx 1 hcpuser hcpuser 63 May 8 16:51 100307_3T_BIAS_BC.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_BIAS_BC.nii.gz
lrwxrwxrwx 1 hcpuser hcpuser 74 May 8 16:51 100307_3T_FieldMap_Magnitude.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_FieldMap_Magnitude.nii.gz
lrwxrwxrwx 1 hcpuser hcpuser 70 May 8 16:51 100307_3T_FieldMap_Phase.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_FieldMap_Phase.nii.gz
lrwxrwxrwx 1 hcpuser hcpuser 64 May 8 16:51 100307_3T_T1w_MPR1.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_T1w_MPR1.nii.gz

Return to Table of Contents


Step 8: Shutdown and Restart of an instance

Step 8a: Shutdown of a running machine instance

Important:

  1. The Terminate option is equivalent to deleting the machine instance for good. Only use this option if you really want the machine instance to be deleted, not just stopped.
  2. The data on the “local” EBS drive connected to an instance generated from the HCP_NITRC AMI is not “ephemeral storage”. It will persist while the machine instance is stopped. It will not persist if the machine is terminated.

 


Return to Table of Contents


Step 8b: Restart of a machine instance


Return to Table of Contents


Important Notes about Stopping and Restarting machine instances:

It is important to stop your machine instance when it is not in use. Amazon charges you for the instance while it is active/running (whether you are actually using it or not). You are not charged for the instance during the time that it is stopped. You are still charged a monthly rental fee for provisioned EBS storage.

When you do restart an instance that has been stopped, you’ll find that it has a new Public IP address and a new Public DNS entry. You will have to modify the commands you use to connect to the running instance to take into account this new DNS entry.

Similarly, each time you shut down your running instance, the VNC Server session will be shut down. After you start up the instance again, if you want to run another VNC Server session, you will need to visit your HCP_NITRC control panel at *http://***, login, and then press the *Start* *Session* button to get a VNC Server session restarted.

After restarting your instance, you may find that you no longer have access to HCP S3 bucket at the /s3/hcp mount point. If you try to use a command like:

$ cd /s3/hcp

and receive an error message similar to:

-bash: cd: /s3/hcp: Transport endpoint is not connected

then you will need to remount the S3 bucket. This can be done using the following steps:

 


Return to Table of Contents


For the Exploring the Human Connectome Course (Summer 2015), the following steps are optional.

Step 9: Installing StarCluster

At this point you have created an example Amazon EC2 instance that you can use to run HCP Pipelines. It does not have adequate disk space to store the output from many pipeline runs, so this is just an example. You would need to create an instance with significantly more disk space, EBS volume space, to actually run HCP Pipelines.

If we were to allow the PreFreeSurfer pipeline processing that we started a couple steps back to continue, it would run the PreFreeSurfer processing to completion for subject 100307 before moving on to running the PreFreeSurfer processing for the next subject in our list, 111413. To do this for very many subjects would be very time consuming as the processing would be happening serially (one subject, then the next, then the next, etc.) on this single machine instance.

To make this processing less time consuming and more cost efficient, we can, instead of just running the pipelines on this one Amazon EC2 instance, distribute the jobs across a cluster of EC2 instances.

StarCluster (http://star.mit.edu/cluster) is available from the STAR (Software Tools for Academics and Researchers) program at MIT. StarCluster is a cluster-computing toolkit specifically designed for Amazon’s EC2. Installation documentation for StarCluster can be found at http://star.mit.edu/cluster/docs/latest/installation.html.

StarCluster is written in the Python programming language, and your HCP_NITRC instance already has a Python module installed on it (called easy_install) that allows for easy installation of Python packages. Therefore, installing StarCluster is as simple as entering the following command in a terminal connected to your running HCP_NITRC instance (followed by the password for your hcpuser account when prompted).

Important:

  1. If you have stopped and restarted your MyHCP_NITRC instance, you will need to reconnect to that instance either via a Guacamole-based GUI in a web browser interface or via SSH in a terminal.
  2. Keep in mind that upon restarting your MyHCP_NITRC instance will have a different Public DNS.
  3. The following commands are to be entered within a terminal connected to your MyHCP_NITRC instance (in the Guacamole-based GUI or connected via SSH).
$ sudo easy_install StarCluster
(enter your password e.g. hcppassword when prompted)

The installation process will display a number of messages about installing prerequisite software and should end by returning you to the $ prompt. Note, the whenever the $ prompt is used for the remainder of this document, your actual prompt will likely not be just a $. For example, it may include a user name (e.g. hcpuser), a node name (e.g. nitrcce), and your current working directory before the $. So your actual $ prompt might look like: hcpuser@nitrcce:~$

You can verify that the installation was successful by asking for the StarCluster version number with a command like:

$ starcluster --version
StarCluster – (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
 
0.95.6
$

 


Return to Table of Contents


Step 10: Create an AWS Access Key Pair

In order to configure and use StarCluster, you will need an AWS Access Key ID and AWS Secret Access Key for your AWS account. These are a different AWS Access Key Pair than you created for accessing the HCP S3 data. That previous access key pair are associated with your HCP ConnectomeDB account. The pair that you create as part of this step are for access to your Amazon AWS account. The StarCluster software will need to access your AWS account.

To create the necessary AWS key pair, do the following:

 


Return to Table of Contents


 

Step 11: Setup a cluster for running HCP Pipelines

Step 11a: Supply StarCluster with your AWS credentials

Next, you will need to begin the process of creating and editing a StarCluster configuration file.

$ starcluster help
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
 
!!! ERROR - config file /home/hcpuser/.starcluster/config does not exist

Options:
--------
[1] Show the StarCluster config template
[2] Write config template to /home/hcpuser/.starcluster/config
[q] Quit
 
Please enter your selection:
Please enter your selection: 2
 
>>> Config template written to /home/hcpuser/.starcluster/config
>>> Please customize the config template

 

$ cd ~/.starcluster
$ gedit config

 

[aws info]
# This is the AWS credentials section (required).
# These settings apply to all clusters
# replace these with your AWS keys
AWS_ACCESS_KEY_ID = #your_aws_access_key_id
AWS_SECRET_ACCESS_KEY = #your_secret_access_key
# replace this with your account number
AWS_USR_ID= #your userid

 


Return to Table of Contents


Step 11b: Creating an Amazon EC2 key pair

StarCluster will be creating and configuring a number of machine instances for you. To do this, in addition to needing access to your account, StarCluster will also need an EC2 key pair to use to connect to and configure EC2 instances on your behalf. Therefore, you must create at least one EC2 key pair to supply to StarCluster via its configuration file.

You can have multiple EC2 key pairs. Each cluster that you create will be associated with one of your key pairs. For now, we will just create a single key pair.

StarCluster itself has a convenient mechanism built in (once it has your AWS account credentials) for creating an EC2 key pair.

$ cd
$ mkdir .ssh
$ starcluster createkey mykey -o ~/.ssh/mykey.rsa
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
 
>>> Successfully created keypair: mykey
>>> fingerprint: e9:9a:a8:f6:7f:63:cb:87:40:2e:14:6d:1a:3e:14:e4:9f:9b:f4:43
>>> keypair written to /home/hcpuser/.ssh/mykey.rsa
$
[key mykey]
key_location = ~/.ssh/mykey.rsa
...
[cluster smallcluster]
keyname = mykey

 


Return to Table of Contents


Step 11c: Start an example cluster

Important:

  1. The output supplied while creating the cluster is somewhat long and is not all included below. To confirm the success of this operation, look for text that reads “The cluster is now ready to use” in the output.
$ starcluster start mysmallcluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Using default cluster template: smallcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 2-node cluster...
>>> Creating security group @sc-mysmallcluster...
>>> Waiting for security group @sc-mysmallcluster...
Reservation:r-77fcd49b
>>> Waiting for instances to propagate...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
.
.
.
>>> Configuring cluster took 1.801 mins
>>> Starting cluster took 3.970 mins

The cluster is now ready to use. To login to the master node
as root, run:
.
.
You can activate a 'stopped' cluster by passing the -x
option to the 'start' command:

    $ starcluster start -x mysmallcluster

This will start all 'stopped' nodes and reconfigure the
cluster.
$

 


Return to Table of Contents


Step 11d: Navigate your example cluster

Important:

  1. Instances which are StarCluster cluster nodes, such as the instances named master and node001, should be left under the control of StarCluster. You should not start, stop, terminate, or reboot such nodes using the Actions button at the top of your instance table. Doing so can potentially make your cluster unusable.
  2. For example, if you want to stop the nodes master and node001 in the cluster you’ve just created (named mysmallcluster), you should do so by logging on to your MyHCP_NITRC instance and issuing the appropriate StarCluster command as shown in the example StarCluster Commands below (e.g. $ starcluster stop mysmallcluster).

Example StarCluster Commands

Now is a good time to become familiar with some basic StarCluster commands. You will issue such StarCluster commands on a terminal connected your HCP_NITRC instance. In the below examples, text enclosed in angle brackets, < >, should be replaced by names that you provide.

Important:

  1. There is no need to enter the example commands below now. These examples are provided here just to familiarize you with some of the available StarCluster commands and concepts. After the example commands, we will return to steps that you should carry out.

    • To see what clusters you have currently in existence:
starcluster listclusters

 

starcluster start -c <template-name> <new-cluster-name>

 

starcluster restart <running-cluster-name>
starcluster stop <running-cluster-name>

 

starcluster terminate <cluster-name>

NOTE: Stopping a cluster is analogous to turning off the machines. Terminating a cluster is analogous to throwing away the machines. When you terminate, the instances go away and cannot be restarted; they are gone.

 

starcluster start -x <cluster-name>

 

starcluster sshmaster <cluster-name>

 

starcluster sshnode <cluster-name> <node-name>

 

Important:

  1. The following steps are those you should start carrying out again.
    • From your MyHCP_NITRC instance, use the starcluster sshmaster command to login to the master node of your cluster named mysmallcluster and place a file in the /home directory
$ starcluster sshmaster mysmallcluster
# cd /home
# ls
sgeadmin  ubuntu
# echo "hello there" > hello.txt
# ls
hello.txt  sgeadmin  ubuntu
# more hello.txt
hello there
# exit
$ starcluster sshnode mysmallcluster node001
# cd /home
# ls
hello.txt  sgeadmin  ubuntu
# cat hello.txt
hello there
# exit

 


Return to Table of Contents


Step 11e: Terminate your small cluster

$ starcluster terminate mysmallcluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
 
Terminate EBS cluster mysmallcluster (y/n)? y
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Terminating node: master (i-5bd7a38d)
>>> Terminating node: node001 (i-5ad7a38c)
>>> Waiting for cluster to terminate... 
>>> Removing security group: @sc-mycluster 
$

 


Return to Table of Contents


Step 11f: Create an instance to use as a model for your pipeline cluster nodes

We now need to create an Amazon EC2 instance that will be used as a “template” for creating the nodes in a cluster that can run HCP pipelines. We’ll start by creating another instance that is based on the HCP_NITRC AMI.

Note that the process of creating an instance that can be used as a model for pipeline cluster nodes is currently somewhat complicated. This is due to expectations that StarCluster has for AMIs that it can use as StarCluster nodes. There are ongoing efforts between HCP and NITRC to simplify this process.

$ cd <directory-containing-your-PipelineNodeTemplate.pem-file>
$ mkdir -p ~/.ssh
$ cp PipelineNodeTemplate.pem ~/.ssh
$ chmod 400 ~/.ssh/PipelineNodeTemplate.pem
$ sftp hcpuser@<your-HCP_NITRC-instance-public-dns>.compute-1.amazon.aws.com
Enter your password (e.g. hcppassword) when prompted
sftp> cd .ssh
sftp> pwd
Remote working directory: /home/hcpuser/.ssh
sftp> put PipelineNodeTemplate.pem
Uploading PipelineNodeTemplate.pem to /home/user/.ssh/PipelineNodeTemplate.pem
PipelineNodeTemplate.pem                                                        100% 1692 1.7KB/s 00.00
sftp> exit
$
# You can of course have multiple key sections
# [key myotherkey]
# KEY_LOCATION=~/.ssh/myotherkey.rsa

[key PipelineNodeTemplate]
KEY_LOCATION=~/.ssh/PipelineNodeTemplate.pem

Return to Table of Contents


Step 11g: Further prepare your new instance for StarCluster use

tsc5yc@mst.edu
7361
 *CMS6c5mP.wmk
 FSQVaStVzhzXA

Turn off the software firewall on your PipelineNodeTemplate instance

$ sudo ufw disable
Enter your password (e.g. hcppassword) when prompted
Firewall stopped and disabled on system startup
$

Delete gridengine software from your PipelineNodeTemplate instance

$ sudo apt-get remove gridengine-client gridengine-common gridengine-master
Enter your password (e.g. hcppassword) if/when prompted
Enter Y when asked if you want to continue

Delete the sgeadmin account and group.

Important:

  1. Double check your use of the sudo rm -rf command below to make sure it matches exactly what is written below before you press enter.
$ sudo userdel sgeadmin
Enter your password (e.g. hcppassword) when/if prompted
$ sudo rm -rf /var/lib/gridengine
$ sudo delgroup sgeadmin

Remove the SGE_ROOT setting in the /etc/profile file

$ sudo sed -i 's/export SGE_ROOT/#export SGE_ROOT/g' /etc/profile

Return to Table of Contents


Step 11h: Install SGE files

We need to create another running instance. This instance needs to be based on an officially released StarCluster AMI. We’ll need to copy some files from that running instance to our PipelineNodeTemplate instance.

Important:

  1. In the following commands: = the Public DNS for the instance you just created (the t1.micro instance)
  2. In the following commands: = the Public DNS for your PipelineNodeTemplate instance

Create a compressed tar file containing what StarCluster needs

$ ssh -i ~/.ssh/PipelineNodeTemplate.pem root@<new-instance-DNS>
# cd /
# tar cavf opt_starcluster.tar.gz ./opt
...(see a log of files placed in the opt_starcluster.tar.gz file)...
# exit

Copy the compressed tar file you just made to your local machine

$ scp -i ~/.ssh/PipelineNodeTemplate.pem root@<new-instance-DNS>:/opt_starcluster.tar.gz ./
...(see scp output showing percent copied and ETA until done, etc.)...

Copy the compressed tar file from your local machine to your PipelineNodeTemplate instance

Important:

  1. Note that the root@ near the end of the following command is specifying the DNS for your PipelineNodeTemplate be used. This is not the DNS for the new instance (the t1.micro) that you used in the just previous command.
  2. Don’t forget the :. at the end of the command.
$ scp -i ~/.ssh/PipelineNodeTemplate.pem opt_starcluster.tar.gz root@<PipelineNodeTemplate-DNS>:.
...(see scp output showing percent copied and ETA until done, etc.)...

Unpack the compressed tar file and copy its contents to where StarCluster expects it

$ ssh -i ~/.ssh/PipelineNodeTemplate.pem root@<PipelineNodeTemplate-DNS>
# tar xvf opt_starcluster.tar.gz
...(see tar output showing files being unpacked from opt_starcluster.tar.gz)...
# mv opt/sge6-fresh /opt
# exit

Terminate the instance you just created based on the StarCluster AMI

Important:

  1. When you terminate the instance you just created, it is important that you verify that only the instance you just created is selected. Make sure no other instances are selected!
    • Visit your Instance table, make sure only the instance that you just created (it will be of type t1.micro) is selected.
    • Select Actions → Instance State → Terminate followed by Yes, Terminate.

Return to Table of Contents


Step 11i: Create an EBS volume to hold data to be shared across your cluster

You now need an EBS volume (think of it as a simple Hard Disk Drive) that will contain your data for processing. It would be best if this volume is independent of any particular EC2 instance (machine) whether that instance is part of a cluster or not. That way, if you terminate the instances, your data will persist. We’ll create such a volume, and then setup StarCluster so that the created volume gets mounted to all the nodes in the cluster that we create for running pipelines.

$ starcluster createvolume --name=mydata 200 us-east-1a --shutdown-volume-host
.
.
>>> Checking for required remote commands...
>>> Creating 200GB volume in zone us-east-1a
>>> New volume id: vol-4b5b480c
>>> Waiting for vol-4b5b480c to become 'available'...
.
>>> Your new 200GB volume vol-4b5b480c has been created successfully
.
.
#############################
## Configuring EBS Volumes ##
#############################
# StarCluster can attach one or more EBS volumes to the master and then
# NFS_share these volumes to all of the worker nodes. ...
[volume mydata]
VOLUME_ID = vol-4b5b480c
MOUNT_PATH = /mydata

Return to Table of Contents


Step 11j: Create an AMI for cluster nodes

$ starcluster ebsimage i-12e5e73d pipelineclusterami
.
.
>>> New EBS AMI created: ami-feb7aa96
>>> Waiting for snapshot to complete: snap-d6a61fa0
Snap-d6a61fa0: | 100% ETA: --:--:-- 
 >>> Waiting for ami-feb7aa96 to become available...
>>> create_image took 7.253 mins
>>> Your new AMI id is: ami-feb7aa96

 


Return to Table of Contents


Step 11k: Configure and Start a Pipeline Cluster

Next we’ll modify the StarCluster configuration file to create a template for a cluster that is appropriate for running HCP Pipelines. It will use the AMI that we just created as the starting point image (e.g. ami-feb7aa96 above, but yours will be different) for both the master and the worker nodes.

Important:

  1. Your enty for NODE_IMAGE_ID will be your AMI ID from the previous substep, not ami-feb7aa96.
  2. The NODE_INSTANCE_TYPE value is where you specify the instance type for nodes in your cluster. When running pipelines “for real”, this is where you might find it necessary to choose a different instance type to provide your cluster nodes with more processing power, more RAM, or GPU access depending upon what stage of the HCP Pipelines you are running. You can always change this value and start a new cluster using a different instance type for running different stages of the pipelines.
[cluster pipelinecluster]
KEYNAME = mykey
CLUSTER_SIZE = 5
NODE_IMAGE_ID = ami-feb7aa96
NODE_INSTANCE_TYPE = m3.medium
VOLUMES = mydata
$ starcluster start -c pipelinecluster mypipelinecluster

Important:

  1. This is a reiteration of a previous point. The master and nodeXXX instances should be left under the control of StarCluster. Do not try to start, stop, terminate, or reboot them using the Actions button at the top of the Instance table.
  2. Control these instances by logging in to your MyHCP_NITRC instance, and issuing StarCluster commands like starcluster stop mypipelinecluster.
    • Once your cluster is configured and running and you receive the “The cluster is now ready to use.” message, use the starcluster sshmaster and starcluster sshnode commands to login to your cluster nodes and verify that the /mydata and the /home directories are shared between the cluster nodes.
$ starcluster sshmaster mypipelinecluster
# cd /home
# ls
hcpuser  sgeadmin  ubuntu
# touch afileinhome.txt
# ls
afileinhome.txt  hcpuser  sgeadmin  ubuntu
# cd /mydata
# ls
lost+found
# touch afileinmydata.txt
# ls
afileinmydata.txt  lost+found
# exit
$ starcluster sshnode mypipelinecluster node001
# cd /home
# ls
afileinhome.txt  hcpuser  sgeadmin  ubuntu
# cd /mydata
# ls
afileinmydata.txt  lost+found

 


Return to Table of Contents


Step 12: Getting the HCP OpenAccess data available to your cluster

You now have a running cluster that has the necessary software installed for running the HCP Pipelines. However, none of the nodes in the cluster (master or workers) have direct access to the HCP OpenAccess S3 data. For this exercise, we will see how to easily copy the data you would like to use for pipeline processing from the HCP OpenAccess S3 bucket to the /mydata directory that is shared between your cluster nodes.

Step 12a: Setting up s3cmd on your master node

S3cmd (http://s3tools.org/s3cmd) is a free command line tool for uploading, retrieving and managing data in an Amazon S3 bucket. S3cmd is pre-installed in the HCP_NITRC AMI. Therefore, it is available to use on your cluster nodes. In particular for our use now, it is available for use on the master node of your cluster.

To configure s3cmd so that it can access the HCP OpenAccess bucket, you will need your AWS Access Key ID and your AWS Secret Access Key that you obtained for accessing S3 bucket, those that you obtained back in Step 1 and are associated with your Connectome DB account. These are not the AWS Key ID and the AWS Secret Access Key that you obtained in Step 10. They are the access key pair created in Step 1.

Important:

  1. Substitute the Public DNS for your HCP_NITRC instance for <your-HCP_NITRC-Public-DNS> in the command below.
  2. Enter the password for the hcpuser account (e.g. hcppassword) when prompted.
$ ssh -X hcpuser@<your-HCP_NITRC-Public-DNS>

 

$ starcluster sshmaster mypipelinecluster
#

Important:

  1. The Access Key and Secret Key you enter here are those that you got way back in step 1. They are the keys necessary for you to access the HCP OpenAccess S3 bucket.
root@master:~# s3cmd --configure

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
 
Access key and Secret key are your identifiers for Amazon S3
Access Key: <your-access-key>
Secret Key: <your-secret-key>
 
Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: <just-press-enter>
Path to GPG program [/usr/bin/gpg]: <just-press-enter>
 
When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP and can't be used if you're behind a proxy
Use HTTPS protocol [No]: <just-press-enter>
 
On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't conect to S3 directly
HTTP Proxy server name: <just-press-enter>
 
New settings:
  Access Key: <your-access-key>
  Secret Key: <your-secret-key>
  Encryption password: 
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name: 
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] y
Please wait...
 
Success. Your access key and secret key worked fine :-)
 
Now verifying that encryption works…
Not configured. Never mind.
 
Save settings? [y/N] y
Configuration saved to '/root/.s3cfg'
root@master:~#
root@master:~# s3cmd ls
2014-05-15 18:56  s3://hcp-openaccess
2014-05-15 18:57  s3://hcp-openaccess-logs
root@master:~# s3cmd ls s3://hcp-openaccess
                       DIR s3://hcp-openaccess/HCP/
root@master:~# s3cmd ls s3://hcp-openaccess/HCP
ERROR: Access to bucket 'hcp-openaccess' was denied
root@master:~# s3cmd ls s3://hcp-openaccess/HCP/
                       DIR s3://hcp-openaccess/HCP/100307/
                       DIR s3://hcp-openaccess/HCP/100408/
                       DIR s3://hcp-openaccess/HCP/101006/
                       DIR s3://hcp-openaccess/HCP/101107/
                       DIR s3://hcp-openaccess/HCP/101309/
. . .
2015-01-24 21:34         0 s3://hcp-openaccess/HCP/
2015-05-08 08:26      3577 s3://hcp-openaccess/HCP/S500.txt
2015-01-28 08:22       700 s3://hcp-openaccess/HCP/UR100.txt
root@master:~#

Notice that the ls subcommand of s3cmd (s3cmd ls) is a bit picky with regard to whether you include the final / in the name of a directory. Without the /, you get an access denied error. With the /, you can see the subdirectories containing subject data.

 


Return to Table of Contents


Step 12b: Retrieving data to process from the HCP OpenAccess S3 Bucket

# cd /mydata
# wget https://github.com/Washington-University/access_hcp_data/archive/v3.0.0.tar.gz
# tar xvf v3.0.0.tar.gz
# /mydata/access_hcp_data-3.0.0/sync_hcp_data --subjlist=/mydata/access_hcp_data-3.0.0/example_subject_list.txt --dest=/mydata --stage=unproc

https://sagebionetworks.jira.com/wiki/display/SCICOMP/Configuration+of+Cluster+for+Scientific+Computing.
That page contains a simple diagram that illustrates our AWS and StarCluster configuration. With a few minor adjustments, the illustration in the Overview section of that page shows our configuration.

+ The node that is labeled *admin* in the illustration is equivalent to the HCP\_NITRC instance we created in the early parts of this practical (a.k.a. MyHCP\_NITRC).
+ The disk icon in the illustration shows that the NFS Mounted EBS Volume is available at a mount point called **/shared**. Our NFS Mounted EBS Volume is available at a mount point called **/mydata** instead.
+ Not shown, is that the **/home** directory is also shared between the master and the worker nodes.
+ Our current cluster only has 4 worker nodes (node001 … node004) instead of the 999 nodes shown in the diagram.

 


Return to Table of Contents


Step 13: Editing files to run a pipeline stage

Once again, this step should be familiar as you are editing the PreFreeSurferPipelineBatch.sh script and the SetUpHCPPipeline.sh script to match your cluster configuration.

$ starcluster sshmaster -X mypipelinecluster

nano is a relatively user-friendly editor that, like vi, doesn’t need to open a separate window on your screen in which to edit files. Instead it uses your terminal window. To invoke nano just use a command like nano PreFreeSurferBatch.mine.sh. Navigation and editing of text is straightforward. Use the arrow keys to move around in the file; use the Delete or Backspace keys for deleting text; and add new text by simply typing. Once you have made the necessary chagnes, press Ctrl-X to exit the editor, answer Y when prompted to save the buffer, and press Enter when asked for the name of the file to write before exiting.

# cd /home/hcpuser/tools/Pipelines/Examples/Scripts
# cp PreFreeSurferPipelineBatch.sh PreFreeSurferPipelineBatch.mine.sh
# cp SetUpHCPPipeline.sh SetUpHCPPipeline.mine.sh
StudyFolder=/mydata
Subjlist="100307 111413"
EnvironmentScript="/home/hcpuser/tools/Pipelines/Examples/Scripts/SetUpHCPPipeline.mine.sh"
#if [ X$SGE_ROOT != X ] ; then
#    QUEUE="-q long.q"
    QUEUE="-q hcp_priority.q"
#fi
#if [ X$SGE_ROOT != X ] ; then
#    QUEUE="-q long.q"
    QUEUE="-q all.q"
#fi
if [ -n "${command_line_specified_run_local}" ] ; then
      echo "About to run ${HCPPIPEDIR}/PreFreeSurfer/PreFreeSurferPipeline.sh"
      queuing_command=""
  else
      echo "About to use fsl_sub to queue or run ${HCPPIPEDIR}/PreFreeSurfer/PreFreeSurferPipeline.sh"
      queuing_command="${FSLDIR}/bin/fsl_sub ${QUEUE}"
  fi
if [ -n "${command_line_specified_run_local}" ] ; then
      echo "About to run ${HCPPIPEDIR}/PreFreeSurfer/PreFreeSurferPipeline.sh"
      queuing_command=""
  else
      echo "About to use qsub to queue ${HCPPIPEDIR}/PreFreeSurfer/PreFreeSurferPipeline.sh"
      queuing_command="qsub ${QUEUE}"
      queuing_command+=" -o ${HCPPIPEDIR}/Examples/Scripts/${Subject}.PreFreeSurfer.stdout.log"
      queuing_command+=" -e ${HCPPIPEDIR}/Examples/Scripts/${Subject}.PreFreeSurfer.stderr.log"
  fi
# Set up FSL (if not already done so in the running environment)
export FSLDIR="/usr/share/fsl/5.0"
. ${FSLDIR}/etc/fslconf/fsl.sh
 
# Set up FreeSurfer (if not already done so in the running environment)
export FREESURFER_HOME="/usr/local/freesurfer"
. ${FREESURFER_HOME}/SetUpFreeSurfer.sh > /dev/null 2>&1
 
# Set up specific environment variables for the HCP Pipeline
export HCPPIPEDIR="/home/hcpuser/tools/Pipelines"
export CARET7DIR="/usr/bin"
set -e
EnvironmentScript="/home/hcpuser/tools/Pipelines/Examples/Scripts/SetUpHCPPipeline.mine.sh"
. ${EnvironmentScript}

Note that we are making these edits in order to run the PreFreeSurfer portion of Structural Preprocessing. Similar edits to example batch files (e.g. FreeSurferPipelineBatch.sh, GenericfMRISurfaceProcessingPipelineBatch.sh, DiffusionPreprocessingBatch.sh, etc.) would be necessary in order to run those pipelines on your cluster. Edits similar to the one to PreFreeSurferPipeline.sh would also be necessary to files like FreeSurferPipeline.sh, DiffPreprocPipeline.sh, etc.) to run those pipelines on your cluster.

(If you don’t want to lose your edits to the Pipeline script files when your cluster is terminated, you should consider moving the entire /home/hcpuser/tools directory over to somewhere in the /mydata directory. This will put the scripts and your changes to them on the shared volume that persists beyond the life of any given instance. You will need to modify the paths specified in your script files accordingly.)

 


Return to Table of Contents


Step 14: Starting up a set of PreFreeSurfer Pipeline jobs

# cd /home/hcpuser/tools/Pipelines/Examples/Scripts
# ./PreFreeSurferPipelineBatch.mine.sh
...
Your job n ("PreFreeSurferPipeline.sh") has been submitted
...
Your job n+1 ("PreFreeSurferPipeline.sh") has been submitted
...
# ls *.log
100307.PreFreeSurfer.stderr.log  100307.PreFreeSurfer.stdout.log
111413.PreFreeSurfer.stderr.log  111413.PreFreeSurfer.stdout.log
#
# qstat
job-ID  prior   name       user         state submit/start at     queue                         slots
-------------------------------------------------------------------------------------------------------
     20 0.55500 PreFreeSur root         r     05/20/2015 16:00:44 all.q@master                      1
     21 0.55500 PreFreeSur root         r     05/20/2015 16:00:44 all.q@node004                     1
# qstat -j 20
# qdel 20

 


Return to Table of Contents


Step 15: Using the StarCluster load balancer

As you might imagine there can be disadvantages to keeping worker nodes of your cluster running even when they are not being used. In our example so far, we have created a cluster that contains one master node and 4 worker nodes, but we only have 2 jobs running. So at most we really need only 2 nodes right now.

To lower costs, we can take advantage of the StarCluster load balancer. The StarCluster load balancer can observe the job queue for a cluster and start new worker nodes or remove worker nodes from the cluster based on demand.

The load balancer is an experimental feature of StarCluster. To allow the use of an experimental feature, you must edit the .starcluster/config file (on your HCP_NITRC instance, the one on which you have StarCluster installed and from which you started the cluster, not the master node of the cluster on which you were editing scripts in the previous step.)

In the [global] section of your .starcluster/config file include the following line

ENABLE_EXPERIMENTAL=True

You should be able to do this by simply removing the comment marker (#) from a line in the config file that already looks like:

#ENABLE_EXPERIMENTAL=True

Once you have enabled experimental features and have a cluster up and running (e.g. mypipelinecluster), you can start the load balancer for the cluster by issuing the following command:

$ nohup starcluster loadbalance -m 20 -n 3 mypipelinecluster &

 

The -m option specifies the maximum number of nodes in your cluster and the -n option specifies the minimum number of nodes in your cluster. You will need to press enter twice to return to the system prompt.

To find out the process ID of your load balancer issue a command like the following

$ ps -ef | grep loadbalance
hcpuser  24161 20520  1 18:16 pts/1    00:00:03 /usr/bin/python /usr/local/bin/starcluster loadbalance mypipelinecluster
hcpuser  24243 20520  0 18:21 pts/1    00:00:00 grep --color=auto loadbalance

The first numeric entry in the output line that ends with mypipelinecluster and after the hcpuser text (in the example above the number 24161) is the process ID of your load balancer process. To stop the load balancer issue a command like:

$ kill -9 24161

Of course, you will need to substitute your loadbalancer process ID for 24161 in the above.

If you allow the load balancer to continue to run and only have the PreFreeSurfer jobs running for two subjects as we have started in the previous steps, then when you visit your Instance Table in a browser you will likely see the worker nodes that are not being used by your running jobs have been terminated. It can take in the neighborhood of 30 minutes before nodes will be terminated. If you have more jobs queued to run than there are nodes available to run them on (and this situation lasts for a while), the load balancer will (eventually) add new nodes to your cluster.

If your cluster is using spot instances for worker nodes (see the next step), the load balancer will also use spot instances for worker nodes that it adds to your cluster.


Return to Table of Contents


Step 16: Using spot instances as worker nodes

To lower costs even further, we can take advantage of the spot instance mechanism of Amazon AWS. The spot instance mechanism is a way for you to bid on Amazon EC2 instances such that instances are run only when your bid exceeds the current Spot Price for the instance type that you want to use.

Amazon’s documentation at http://aws.amazon.com/ec2/purchasing-options/spot-instances/ describes spot instances as follows:

Spot Instances are spare Amazon EC2 instances for which you can name your own price. The Spot Price is set by Amazon EC2, which fluctuates in real-time according to Spot Instances supply and demand. When your bid exceed the Spot Price, your Spot instance is launched and your instance will run until the Spot Price exceeds your bid (a Spot interruption) or you choose to terminate them. …

To use Spot Instances, you place a Spot Instance request that specifies the instance type, the Availability Zone desired, the number of Spot Instances desired, and the maximum price you are willing to pay per instance hour (your bid).

To determine how that maximum price compares to past Spot Prices, the Spot Price history for the past 90 days is available via the Amazon EC2 API and the AWS Management Console. …

Starting a StarCluster cluster using spot instances as your worker nodes is as simple as using the -b option when starting your cluster from your HCP_NITRC instance. For example:

$ starcluster start -c pipelinecluster -b 0.50 myspotpipelinecluster

The above command would start a cluster named myspotpipelinecluster with a bid of $0.50 per hour for each worker node. (By default, StarCluster will not use spot instances for the master node of a cluster. It is unlikely that you would want your master node to be stopped if the current price exceeds your bid.) But how do you decide what to bid for your worker nodes? As is noted in the quote above, Amazon makes available spot bid history for instance types. StarCluster provides an easy command for viewing that history.

For example:

$ starcluster spothistory m3.medium
. . .
>>> Fetching spot history for m3.medium (VPC)
>>> Current price: $0.1131
>>> Max price: $0.7000
>>> Average price: $0.1570

Adding the -p option (e.g. starcluster spothistory -p m3.medium) will launch a web browser tab and supply you with a graph of the spot price over the last 30 days. (It may take a while to generate the graph and open the browser, so you may not want to do this during class.)

It is very useful to take note of the warning message that you get when starting a cluster using spot instances.

$ starcluster start -c pipelinecluster -b 0.50 myspotpipelinecluster
 .
 .
 .
 *** WARNING - ************************************************************
 *** WARNING - SPOT INSTANCES ARE NOT GUARANTEED TO COME UP
 *** WARNING - 
 *** WARNING - Spot instances can take a long time to come up and may not
 *** WARNING - come up at all depending on the current AWS load and your
 *** WARNING - max spot bid price.
 *** WARNING - 
 *** WARNING - StarCluster will wait indefinitely until all instances (5)
 *** WARNING - come up. If this takes too long, you can cancel the start
 *** WARNING - command using CTRL-C. You can then resume the start command
 *** WARNING - later on using the --no-create (-x) option:
 *** WARNING - 
 *** WARNING - $ starcluster start -x myspotpipelinecluster
 *** WARNING - 
 *** WARNING - This will use the existing spot instances launched
 *** WARNING - previously and continue starting the cluster. If you don't
 *** WARNING - wish to wait on the cluster any longer after pressing CTRL-C
 *** WARNING - simply terminate the cluster using the 'terminate' command.
 *** WARNING - ************************************************************

 


Return to Table of Contents


Links and references

Browse Amazon S3 buckets with Ubuntu Linux: http://makandracards.com/makandra/31999-browse-amzon-s3-buckets-with-ubuntu-linux

Expanding the Storage Space of an EBS Volume on Linux: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html

What Is Amazon EC2?: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html

StarCluster Quick Start: http://star.mit.edu/cluster/docs/latest/quickstart.html

StarCluster Configuration File information: http://star.mit.edu/cluster/docs/latest/manual/configuration.html

Defining StarCluster Templates: http://star.mit.edu/cluster/docs/latest/manual/configuration.html#defining-cluster-templates

Configuration of a Cluster for Scientific Computing: https://sagebionetworks.jira.com/wiki/display/SCICOMP/Configuration+of+Cluster+for+Scientific+Computing

StarCluster Elastic Load Balancer: http://star.mit.edu/cluster/docs/0.93.3/manual/load_balancer.html

 


Return to Table of Contents


ec2-52-4-211-53.compute-1.amazonaws.com

Attachments