Evaluating LLM Roles in Constrained Greenhouse Control
Abstract
Large language models are increasingly proposed as general-purpose components for decision-making systems, including domains that require structured outputs and safety constraints. This paper studies a narrow but important question: in a low-entropy greenhouse control task, should an LLM be used as a direct actuator controller, or is it more useful as a high-level supervisor and goal interpreter? We build a reproducible grow-box control benchmark with classical rule-based, hysteresis, and random-forest controllers; direct LLM controllers with structured output; LLM supervisor variants; hybrid safety gates; and an ambiguous human-goal interpretation task. Across the evaluated settings, direct Mistral control did not outperform classical baselines. A rules-aware direct LLM reached 0.28 exact match on 50 sampled static decisions, while a random forest reached 0.555 on the broader static test frame and a calibrated rule controller reached 0.84 on the sampled supervisor set. Structured output eliminated JSON/schema failures in later prompts, but did not guarantee semantic control quality. Supervisor and goal-interpretation results were more promising but still did not beat calibrated or keyword baselines. The central lesson is that structured outputs are interface guarantees, not control guarantees; LLMs should be separated from low-level actuation unless their signals are validated, gated, and audited.