Nanogpt
cheat sheet
Artificial Intelligence is taking over our world. Here is a quick guide for CS4501 to deploy this revolutionary technology and create your own burning man attendee.
Created with Dalle3
PREP
You are about to train a real, fully working artificial intelligence model, and things can (and will) probably go wrong. Don’t fret, ask for help, and utilize all of the skills you learned in this class to create a true A.I. Welcome to the uprising.
TMUX:
If you are still not using a terminal multiplexer, there is no time like the present:
TABBY: (Steve’s Recc.. (GET THIS ONE IF YOU DON’T KNOW)) => https://tabby.sh/
HYPER: Terminal Multiplexer hybrid mostly for MacOS => https://hyper.is/
SCREEN: For the VIM Evangelists that love pain => https://www.gnu.org/software/screen/
FTG:
File Transfer Guides for when we inevitably forget the command
Secure Content Protocol (SCP): GeeksForGeeks.org
Secure File Transfer Protocol (SFTP): GeeksForGeeks.org
Acquire
GPU
Enabled Machine
First, we need to get access to a GPU-Enabled Machine using the CS dept. resources. This is the recommended way of completing this assignment.
If you are feeling extra saucy, or want to use this on a personal device (because its fun-as-heck) you can attempt this with your OWN GPU DESKTOP. However, we do not officially support it and cannot guarantee results.
If you have a Windows 11 or Linux desktop with an NVIDIA GPU (sorry AMD) and want to attempt installing the proper drivers, talk to Steve, but note that this is considered unsupported and outside the scope of this project.
SSH into the cs portal
Fun Fact: The CS Dept. maintains multiple portals, so if one of them seems to be slow, try ssh-ing into a different numbered portal.
SSH into A GPU Enabled Machine
Choose from the below machines and ssh into one using the following command:
ssh <userid>@gpusrv<XX>.cs.virginia.edu
ssh <userid>@portal.cs.virginia.edu
CONFIG
Conda
Anaconda, or as most people know it Conda, is a python package manager that creates a virtual environment container. Why would you want that? Because when something goes Kah-poot you only have to nuke your environment instead of your entire OS.
Activate the CONDA Module
The CS servers are not sudo enabled, so the only way to install packages outside of the conda environment is by using modules. To load/install conda on your gpu-server use the following command:
module load anaconda3
You may need to log out of portal and relog back in. If you completed this step right, you should see a new (base) at the front of your terminal/command prompt session. This indicates that Conda has been initialized to its default environment. You can see the avilable modules using the command:
module list
If conda is still not appearing, you may need to run the conda init command:
conda init
Install
NAnoGPT
For installing NanoGPT, all instructions were taken directly from the github readme: https://github.com/karpathy/nanoGPT. This guide will help you startup and get the basic example running. You will have to feed the model your own data and continue training the model after this guide finishes.
CREATE YOUR CONDA enviroment
First, you need to create the Conda env. that you will install all of your python packages to. You can do this with the following command:
conda create —n nanoGPT python=3.10
Hit y for yes to install the new packages and dependencies.
This creates a new Conda environment. Note that the last command is specifying the python version. This can be changed but shouldn’t be necessary for this assignment.
Activate your new conda environment with the following command:
conda activate nanoGPT
To the right of your command prompt, you should now see nanoGPT instead of base.
Make sure that as you are installing new packages, you always see the following (nanoGPT)before your command prompt.
- IMPORTANT NOTE -
You will have to reactivate the Conda enviroment EVERY TIME YOU LOG IN! If you’ve logged out of your session, and haven’t run the above conda activate command, then your code may not work. Reactivate your conda enviroment and your code should start working again.
- - - - - - - - - -
CLone the github repo
Clone nanoGPT’s github repo using the following command:
git clone https://github.com/karpathy/nanoGPT.git
install the required packages
This is the step where something will most likely go wrong. Especially with Pytorch. Make sure that you followed the above steps, and if necessary rebuild your conda environment. It is best to start from a fresh environment when trying new configurations, especially if an install broke.
Install nanoGPT’s dependencies using the following command:
python3 -m pip install torch numpy transformers datasets tiktoken wandb tqdm
Torch may have a broken install. If you receive a CudNN error, other students have been able to remedy this with the following command:
python3 -m pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html
Test your nanogpt installation
Next, test NanoGPT to make sure it is working properly by running the example in the quickstart guide.
CD into the NanoGPT directory:
cd nanoGPT
Prepare the Shakespeare dataset:
python3 data/shakespeare_char/prepare.py
Train the model:
python3 train.py config/train_shakespeare_char.py
Be patient after executing this command, as it may take a couple of minutes for output to appear. The model takes ~15mins to train. Don’t be alarmed if it takes longer or shorter, as this will depend on both the machine being used as well as how many other people are using the machine. If all goes well, you should see something similar to the below screenshot:
- IMPORTANT NOTE -
The CS GPU Machines are being used by lots of people and are usually under heavy demand. I highly recommend not waiting until the last minute to train your model, because if you do you may receive the following error:
If you receive this error don’t panic. Switch to another GPU server, activate your conda environment, and try to train the model again. You may have to switch around to multiple machines before you find one with enough available memory.
- - - - - - - - - -
Test the model:
python3 sample.py --out_dir=out-shakespeare-char
Your model should be spitting out shakespearean-esque quotes now! Hit ctrl+c to cancel the output. Please note that your output may differ from what is pictured below, and that is totally fine:
Add
YOUR
DATA
If you’ve made it this far, that means your model and environment are configured! Now you just need to start by adding your own data.
points of reference
You may want to check out the file: data/shakespeare/prepare.py
This has additional information about how to prepare the text for parsing into the NanoGPT model. Follow the way they create their script, and you should be able to import similar data in a similar way.
Make sure to also split your data up for the Training and Validation datasets. If you do not give an appropriate distribution of data, then your model may not train properly.