This project aims to deliver a tutorial on deploying GPT-2 on AWS, constrained to free tier services. By building on Andrey Karpathy's llm.c project, this initiative seeks to demonstrate how language models can be run efficiently and cost-effectively, even within the constraints of limited cloud resources. The project will offer a publicly accessible endpoint for users to interact with and generate text using GPT-2, all while adhering to AWS's free-tier services. This project aims to highlight Fuzzy Labs as domain experts in the field, illustrating our low-level knowledge in model internals and deployments.
The value of this project lies not only in its technical outputs but also in the educational content it will provide. A key motivation for this project comes from Kelsey Hightowers "Kubernetes the Hard Way" approach—an abstraction-free, hands-on learning method that focuses on understanding the underlying workings of systems. Similarly, "How Low Can You LLM?" takes a deep dive into the low-level details of deploying a language model like GPT-2, focusing on the foundational knowledge of implementing the forward pass through the network in C and how this model can be served through a lightweight HTTP server. Through detailed blog posts and short-form content users will gain practical, hands-on knowledge about how machine learning models work under the hood and how to deploy them abstraction-free.
The project will produce several key tangible outputs. First, there will be a publicly accessible API endpoint where users can interact with the GPT-2 model and generate text. Additionally, the project will create a variety of content streams from the inner workings of large-language models to the fine-tuning of the GPT-2 model itself, using an open-source MLOps stack. Moreover, a survey of AWS free-tier services and the Terraform required to deploy these services to AWS will allow us to illustrate our MLOps expertise.
The reliance on AWS free-tier services also means that anyone can clone the project’s repo and deploy it themselves without any upfront costs, making it a truly accessible learning resource for anyone interested in machine learning and cloud computing.