• Skip to main content
  • Skip to main navigation
Baylor University Baylor University
Information Technology Services
  • About
    • Campus Technology Partners
    • New Student Resources
    • Training/Seminars
  • Policies + Guidelines
  • Research Technology
  • GenAI
  • Security
    • Baylor ITS Staff Badges
    • Bear ID
    • BearAware
    • Data Classification Standards
    • Information Security Program Plan
    • Network Filtering
  • Solutions + Services
  • Support
Baylor BU Information Technology Services Research Technology Kodiak GPUs
Kodiak GPUs

Kodiak GPUs

To run a CUDA (i.e., NVIDIA GPU) enabled program, you will need to specify the gpu queue when you submit your job. This will run the program on one of the gpu nodes. Because each gpu node has two GPUs, you should request 18 (or 36 if your job needs both GPUs) processors per node even if your program only runs as a single process. If you specify fewer than 18 processors, then multiple jobs may run simultaneously each competing for the same GPU.

$ qsub -q gpu -l nodes=1:ppn=18 my_gpu_program.sh

Note: Be sure to load the cuda##/toolkit module in your qsub-ed shell script before running your CUDA enabled program.

Each gpu node has two GPUs, designated as device 0 and device 1. So assuming your program is only going to use a single GPU, when your job starts, which device will it run on? By default, it will run on device 0 even if another process is also running on device 0.

NVIDIA provides a mechanism that allows you to specify which GPU to run on. You can set an environment variable, CUDA_VISIBLE_DEVICES to the device number(s) that you want to use. For example, if you want to only run on device 0, then add export CUDA_VISIBLE_DEVICES=0 to your qsub-ed script somewhere before your program runs. Likewise, to run on device 1, add export CUDA_VISIBLE_DEVICES=1. To allow your program to run on either device, export CUDA_VISIBLE_DEVICES=0,1 (or just don't set the environment variable at all). But this will usually mean that the program will run on device 0.

You should set CUDA_VISIBLE_DEVICES to a GPU device that is currently idle and not in use by any other processes. But how can you know if a device is idle? If you are logged in to a gpu node interactively, you can run the nvidia-smi utility:

$ module load cuda92/toolkit/9.2.88

$ nvidia-smi
Tue Jul  9 16:46:45 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:05:00.0 Off |                    0 |
| N/A   29C    P0    26W / 250W |      0MiB / 16280MiB |     85%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  On   | 00000000:82:00.0 Off |                    0 |
| N/A   26C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    348961      C   some_cuda_program                          15387MiB |
+-----------------------------------------------------------------------------+

$ export CUDA_VISIBLE_DEVICES=1

But if you submit your job normally, i.e., a non-interactive session, the nvidia-smi utility isn't an option. Instead, there is a simple utility on Kodiak, idlegpu that outputs the device number of any GPU that is idle. So if device 0 is currently in use but device 1 is not, it will output "1". Similarly, if device 1 is in use but device 0 is idle, it will output "0". If both devices are idle, it will output "0,1". And finally, if both devices are in use, it will output nothing. (This final case shouldn't happen if everyone "plays by the rules" and requests 18 or 36 ppn for their job.)

Note: You can specify a preferred device if both happen to be idle with the -p option. If you run idlegpu -p 1 and both are idle, idlegpu will output "1" instead of "0,1".

So to use idlegpu in your qsub-ed shell script, you will want to set the value of CUDA_VISIBLE_DEVICES to the output of idlegpu by enclosing it in backticks, i.e., `idlegpu`.

...
export CUDA_VISIBLE_DEVICES=`idlegpu`
echo "Program running on device $CUDA_VISIBLE_DEVICES"

./my_cuda_program

Information Technology Services

One Bear Place #97268
Waco, Texas 76798-7268

helpdesk@baylor.edu
(254) 710-4357
Baylor BU Information Technology Services Research Technology Kodiak GPUs
  • About
    Back
    • Campus Technology Partners
    • New Student Resources
    • Training/Seminars
  • Policies + Guidelines
  • Research Technology
  • GenAI
  • Security
    Back
    • Baylor ITS Staff Badges
    • Bear ID
    • BearAware
    • Data Classification Standards
    • Information Security Program Plan
    • Network Filtering
  • Solutions + Services
  • Support
  • General Information
  • Academics & Research
  • Administration
  • Admissions
  • Gateways for ...
  • About Baylor
  • Athletics
  • Ask Baylor
  • Bookstore
  • Calendar
  • Campus Map
  • Directory
  • Give to Baylor
  • News
  • Search
  • Social Media
  • Strategic Plan
  • College of Arts & Sciences
  • Diana R. Garland School of Social Work
  • George W. Truett Theological Seminary
  • Graduate School
  • Hankamer School of Business
  • Honors College
  • Law School
  • Louise Herrington School of Nursing
  • Moody School of Education
  • Research at Baylor University
  • Robbins College of Health and Human Sciences
  • School of Engineering & Computer Science
  • School of Music
  • University Libraries, Museums, and the Press
  • More Academics
  • Athletics
  • Compliance, Risk and Safety
  • Human Resources
  • Marketing and Communications
  • Office of General Counsel
  • Office of the President
  • Office of the Provost
  • Operations, Finance & Administration
  • Senior Administration
  • Student Life
  • University Advancement
  • Undergraduate Admissions
  • goBAYLOR
  • Graduate Admissions
  • Baylor Law School Admissions
  • Social Work Graduate Programs
  • George W. Truett Theological Seminary Admissions
  • Online Graduate Professional Education
  • Virtual Tour
  • Visit Campus
  • Alumni & Friends
  • Faculty & Staff
  • Online Graduate Professional Education
  • Parents
  • Prospective Faculty & Staff
  • Prospective Students
  • Students
  • Anonymous Reporting
  • Annual Fire Safety and Security Notice
  • Cost of Attendance
  • Digital Privacy
  • Legal Disclosures
  • Mental Health Resources
  • Notice of Non-Discrimination
  • Report It
  • Title IX
  • Web Accessibility
 
Baylor University
Copyright © Baylor® University. All rights reserved.
Baylor University • Waco, Texas 76798 • 1-800-229-5678