
The eval platform for autonomous software
screenshot pendingArchal is an evaluation platform designed for autonomous software, enabling developers to test their agents and software against clones of real services. The platform facilitates a streamlined evaluation process by allowing users to write scenarios in markdown format, which detail the initial state of the clone, the tasks the agent is expected to handle, and the criteria for success. These scenarios are stored in the user's repository and can be reviewed through pull requests, integrating seamlessly into the development workflow.
The evaluation process consists of four main steps. First, user…