llama.cpp

A C++ implementation of LLaMA that runs inference locally with minimal setup