Evaluating LLM Outputs Using Elixir

Samuel Pordeus shares his experience using Elixir to test and validate the outputs of Large Language Models (LLMs). The article covers the challenges posed by non-deterministic outputs of AI models and suggests methods to integrate tests directly in the Elixir codebase. Key techniques include creating structured test cases inspired by OpenAI's evaluation methods and handling both structured and unstructured data formats. The article also explains the implementation of a Model-graded Eval, which uses another model to validate LLM outputs, adding robustness to the testing process. Several code snippets are provided to help implement these testing strategies in an Elixir environment.