Using CUDA-C on AWS EC2 GPU Instances

2022-05-24 / 1661 words / 8 minutes

There are lots of options when it comes to improving compute performance. Faster CPUs is one option, and more CPUs, or more cores per CPU can work too, especially if applications make use of threads. More recently, more exotic approaches such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) have been offered by AWS. This article focusses on using CUDA-C on Amazon EC2 GPU instances with NVIDIA graphics cards.

CUDA-C was developed by NVIDIA to make it simpler to develop highly performant parallel compute tasks on the GPUs on their more recent graphcs cards. The excellent “CUDA by Example: An Introduction to General-Purpose GPU Programming” by Jason Sanders and Edward Kandrot explains how to develop software to take advantage of CUDA-compatible GPUs by providing code examples with detailed commentary.

Using CUDA-C on AWS EC2 GPU Instances

Not everyone has a CUDA-compatible GPU though, and since AWS is also used to run many production workloads, an Amazon Elastic Compute Cloud (EC2) instance is a good candidate on which to run experiments. AWS CloudFormation can be used to start up the instances quickly, and shutdown everything. There were two main considerations:

the EC2 instance type
the Amazon Machine Image (AMI)

Selecting the EC2 Instance Type

Selecting the Amazon EC2 instance type took a bit of effort. Opening the AWS Management Console, selecting “EC2”, then selecting “Instance Types” from the navigation pane on the left displays a list of instance types. Filtered with GPUs equal to 1, and sorting by price narrows the options considerably. Initially, I selected the cheapest option, but as the expression goes, you get what you pay for, and it turned out that instance type had a GPU which didn’t support CUDA. Fortunately, the next option, the “g4dn.xlarge” did have an NVIDA GPU. The command below provides some details on the GPU from the AWS perspective:

$ aws ec2 describe-instance-types --instance-types g4dn.xlarge --query "InstanceTypes[].GpuInfo" --no-cli-pager
[
    {
        "Gpus": [
            {
                "Name": "T4",
                "Manufacturer": "NVIDIA",
                "Count": 1,
                "MemoryInfo": {
                    "SizeInMiB": 16384
                }
            }
        ],
        "TotalGpuMemoryInMiB": 16384
    }
]

Selecting the Amazon Machine Image (AMI)

For the AMI, the Deep Learning AMI provides all the necessary tools pre-installed. Again, opening the console, selecting “EC2”, then selecting “AMIs” from the navigation pane on the left displays a list of AMIs. Entering “Deep Learning AMI” into the search box and narrows the options. I selected v61.3 for Amazon Linux 2 AMI for x86_64 machines which would be compatible with the EC2 instance, and made a note of the AMI id (ami-0ac44af394b7d6689).

The CloudFormation Template

The CloudFormation template was quite simple. It consists of an SSH KeyPair (lines 3-6), a Security Group that allows inbound SSH access (lines 7-16), and the g4dn.xlarge EC2 instance running the Deep Learning AMI (Amazon Linux 2) Version 61.3 (lines 17-32). To make things simple, I created two outputs containing the CLI commands required to download the SSH keys (lines 34-35), and the CLI commands to connect to the instance (lines 36-37).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37


AWSTemplateFormatVersion: 2010-09-09
Resources:
  CudaEc2KeyPair:
    Type: AWS::EC2::KeyPair
    Properties: 
      KeyName: !Join [ "-", [ "cudakeys", !Ref "AWS::StackName" ] ]
  SshSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupName: !Join [ "-", [ "cudasshsg", !Ref "AWS::StackName"]]
      GroupDescription: !Join [ " ", [ "SSH Security Group for", !Ref "AWS::StackName" ] ]
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
  CudaEc2Instance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: ami-0ac44af394b7d6689 # Deep Learning AMI (Amazon Linux 2) Version 61.3 in us-east-1 region
      InstanceType: g4dn.xlarge
      KeyName: !Ref CudaEc2KeyPair
      SecurityGroupIds:
        - !Ref SshSecurityGroup
      UserData:
        "Fn::Base64":
          !Sub |
            #!/bin/bash -xe
            git clone https://github.com/CodedK/CUDA-by-Example-source-code-for-the-book-s-examples-.git /home/ec2-user/cbe
            cd /home/ec2-user/cbe/chapter03
            /usr/local/cuda-11.6/bin/nvcc enum_gpu.cu -o enum_gpu
            chown -R ec2-user:ec2-user /home/ec2-user/cbe
Outputs:
  key:
    Value: !Join [ "", [ "aws ssm get-parameter --name /ec2/keypair/", !GetAtt CudaEc2KeyPair.KeyPairId, " --region us-east-1 --with-decryption --query 'Parameter.Value' --output text > ", !Ref "AWS::StackName" ,".pem"  ] ]
  ssh:
    Value: !Join [ "", [ "ssh -i ", !Ref "AWS::StackName" ,".pem ec2-user@", !GetAtt CudaEc2Instance.PublicIp ] ]

Note that line 29 downloads the example programs from the “CUDA by Example: An Introduction to General-Purpose GPU Programming” book, line 30 changes directory into the chapter 3 example programs, line 31 compiles one of the example programs using the nvcc utility, and line 32 gives ownership of the files to the ec2-user user.

The stack is created with the following command, which creates a stack called “cuda3”. Another name could be used and the resources created and stack outputs change to match:

$ aws cloudformation create-stack --stack-name cuda3 --template-body file://cuda.cft.yaml --no-cli-pager
{
    "StackId": "arn:aws:cloudformation:us-east-1:187655263883:stack/cuda3/8c74d310-dd63-11ec-a287-0aa1b5993139"
}

The stack can take a moment to launch. The following command shows the output while it is in progress:

$ aws cloudformation describe-stacks --stack-name cuda3 --no-cli-pager
{
    "Stacks": [
        {
            "StackId": "arn:aws:cloudformation:us-east-1:187655263883:stack/cuda3/8c74d310-dd63-11ec-a287-0aa1b5993139",
            "StackName": "cuda3",
            "CreationTime": "2022-05-27T02:20:17.007000+00:00",
            "RollbackConfiguration": {},
            "StackStatus": "CREATE_IN_PROGRESS",
            "DisableRollback": false,
            "NotificationARNs": [],
            "Tags": [],
            "EnableTerminationProtection": false,
            "DriftInformation": {
                "StackDriftStatus": "NOT_CHECKED"
            }
        }
    ]
}

Once the stack has been successfully launched, describing the stack will display the outputs, including the commands to run to download the SSH keys, and connect to the instance.

$ aws cloudformation describe-stacks --stack-name cuda3 --no-cli-pager
{
    "Stacks": [
        {
            "StackId": "arn:aws:cloudformation:us-east-1:187655263883:stack/cuda3/8c74d310-dd63-11ec-a287-0aa1b5993139",
            "StackName": "cuda3",
            "CreationTime": "2022-05-27T02:20:17.007000+00:00",
            "RollbackConfiguration": {},
            "StackStatus": "CREATE_COMPLETE",
            "DisableRollback": false,
            "NotificationARNs": [],
            "Outputs": [
                {
                    "OutputKey": "ssh",
                    "OutputValue": "ssh -i cuda3.pem ec2-user@3.222.215.126"
                },
                {
                    "OutputKey": "key",
                    "OutputValue": "aws ssm get-parameter --name /ec2/keypair/key-064a1a15725ce226c --region us-east-1 --with-decryption --query 'Parameter.Value' --output text > cuda3.pem"
                }
            ],
            "Tags": [],
            "EnableTerminationProtection": false,
            "DriftInformation": {
                "StackDriftStatus": "NOT_CHECKED"
            }
        }
    ]
}

Connecting and Running a CUDA-C Program

Cut and paste the OutputValue corresponding to the OutputKey of “key” to download the SSH key PEM file from the AWS Systems Manager parameter store.

$ aws ssm get-parameter --name /ec2/keypair/key-064a1a15725ce226c --region us-east-1 --with-decryption --query 'Parameter.Value' --output text > cuda3.pem

There is no output displayed, but a file called cuda3.pem should be created in the local directory. The PEM file should start with “-----BEGIN RSA PRIVATE KEY-----” and end with “-----END RSA PRIVATE KEY-----”.

NOTE: The PEM file right now is world readable. Until you protect it, SSH will fail with the following message:

$ ssh -i cuda3.pem ec2-user@3.222.215.126
The authenticity of host '3.222.215.126 (3.222.215.126)' can't be established.
ECDSA key fingerprint is SHA256:9QMARzfFWsjX1FMXeXo48NXlQ/g6KAllL82+H8N+o3s.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '3.222.215.126' (ECDSA) to the list of known hosts.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0644 for 'cuda3.pem' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
Load key "cuda3.pem": bad permissions
ec2-user@3.222.215.126: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

The command to prevent the error above and protect the file on Mac and Linux is:

$ chmod go-rwx cuda3.pem

Once that is done, cut and paste the OutputValue corresponding to the OutputKey of “ssh” to connect to the EC2 instance. You should be greeted with a similar message to that displayed below:

ssh -i cuda3.pem ec2-user@3.222.215.126
=============================================================================
       __|  __|_  )
       _|  (     /   Deep Learning AMI (Amazon Linux 2) Version 61
      ___|\___|___|
=============================================================================

Please use one of the following commands to start the required environment with the framework of your choice:
for TensorFlow 2.7 with Python3.8 (CUDA 11.2 and Intel MKL-DNN) ________________________________ source activate tensorflow2_p38
for PyTorch 1.10 with Python3.8 (CUDA 11.1 and Intel MKL) __________________________________________ source activate pytorch_p38
for AWS MX 1.8 (+Keras2) with Python3.7 (CUDA 11.0 and Intel MKL-DNN) ________________________________ source activate mxnet_p37

for AWS MX(+AWS Neuron) with Python3 ______________________________________________________ source activate aws_neuron_mxnet_p36
for TensorFlow(+AWS Neuron) with Python3 _____________________________________________ source activate aws_neuron_tensorflow_p36
for PyTorch (+AWS Neuron) with Python3 __________________________________________________ source activate aws_neuron_pytorch_p36

for TensorFlow 2(+Amazon Elastic Inference) with Python3 ______________________________ source activate amazonei_tensorflow2_p36
for PyTorch 1.5.1 (+Amazon Elastic Inference) with Python3 _________________________ source activate amazonei_pytorch_latest_p37
for AWS MX(+Amazon Elastic Inference) with Python3 __________________________________________ source activate amazonei_mxnet_p36
for base Python3 (CUDA 11.0) ___________________________________________________________________________ source activate python3

To automatically activate base conda environment upon login, run: 'conda config --set auto_activate_base true'

Official Conda User Guide: https://docs.conda.io/projects/conda/en/latest/user-guide/
AWS Deep Learning AMI Homepage: https://aws.amazon.com/machine-learning/amis/
Developer Guide and Release Notes: https://docs.aws.amazon.com/dlami/latest/devguide/what-is-dlami.html
Support: https://forums.aws.amazon.com/forum.jspa?forumID=263
For a fully managed experience, check out Amazon SageMaker at https://aws.amazon.com/sagemaker
When using INF1 type instances, please update regularly using the instructions at: https://github.com/aws/aws-neuron-sdk/tree/master/release-notes
Security scan reports for python packages are located at: /opt/aws/dlami/info/
=============================================================================

[ec2-user@ip-172-31-2-42 ~]$

When the EC2 instance was created, the CloudFormation template already contained the instructions to download the example code from the “CUDA by Example: An Introduction to General-Purpose GPU Programming” book and compile one of the example programs. Running the following command will use CUDA-C to interrogate the GPU and provide the details shown:

$ /home/ec2-user/cbe/chapter3/enum_gpu
   --- General Information for device 0 ---
Name:  Tesla T4
Compute capability:  7.5
Clock rate:  1590000
Device copy overlap:  Enabled
Kernel execution timeout :  Disabled
   --- Memory Information for device 0 ---
Total global mem:  15634661376
Total constant Mem:  65536
Max mem pitch:  2147483647
Texture Alignment:  512
   --- MP Information for device 0 ---
Multiprocessor count:  40
Shared mem per mp:  49152
Registers per mp:  65536
Threads in warp:  32
Max threads per block:  1024
Max thread dimensions:  (1024, 1024, 64)
Max grid dimensions:  (2147483647, 65535, 65535)

Remember earlier when the second from cheapest Amazon EC2 instance type needed to be selected because the cheapest didn’t have the necessary GPU to support CUDA-C? Running the same command on that EC2 instance resulted in the following message:

./enum_gpu.cu
no CUDA-capable device is detected in enum_gpu.cu at line 23

Conclusions

It was shockingly simple to create an Amazon EC2 instance with the necessary hardware and software to be able to compile and run a CUDA-C application. While this application doesn’t do much, often the largest challenge when developing using new frameworks is just getting the first application to compile, and we have achieved that goal!

If you’re interested in learning more, check out “CUDA by Example: An Introduction to General-Purpose GPU Programming” by Jason Sanders and Edward Kandrot.

Tags: AWS Elastic Compute Cloud (EC2) GPU CUDA parallel NVIDIA AMI Deep Learning Deep Learning AMI Amazon Linux Amazon Linux 2 ami-0ac44af394b7d6689 NVIDIA CloudFormation
Categories: AWS Amazon Elastic Compute Cloud (EC2) GPU CUDA NVIDIA Aamzon CloudFormation

Using CUDA-C on AWS EC2 GPU Instances

Using CUDA-C on AWS EC2 GPU Instances

Selecting the EC2 Instance Type

Selecting the Amazon Machine Image (AMI)

The CloudFormation Template

Connecting and Running a CUDA-C Program

Conclusions

See Also