@codmir/test-framework

AI-native browser testing framework for desktop and web applications. Uses vision models (Gemini, Claude) to see and interact with real screens instead of relying on CSS selectors or DOM queries. Tests run through the Codmir Desktop agent server.

npm install @codmir/test-framework

Quick references

Overview Quick Start Agent Client Test Builder Vision Vision Assertions Recorder Assertions Runner Reporter Configuration

Overview

Traditional testing frameworks like Playwright and Cypress rely on CSS selectors, data attributes, and DOM queries to find and interact with elements. This works for web apps you control, but breaks when testing desktop applications, third-party UIs, or any interface where you don't have access to the underlying markup.

@codmir/test-framework takes a fundamentally different approach: it uses AI vision models to see what is on screen, find elements by description, read text, and detect errors visually. Tests describe what they want in natural language rather than with brittle selectors.

The framework communicates with the Codmir Desktop agent server over a local TCP socket. The agent server controls the mouse, keyboard, and screen capture at the OS level, meaning you can test any application, not just browsers.

SelectorsNatural language descriptions, not CSS / XPath

ScopeAny desktop or web application

Vision providersGemini (default), Claude — automatic fallback

TransportTCP socket to local Codmir Desktop agent server

RecordingRecord interactions, export as test scripts

Quick Start

Prerequisites: Codmir Desktop running with the agent server enabled, and at least one vision API key set (GEMINI_API_KEY or ANTHROPIC_API_KEY).

import {
  createRunner,
  suite,
  createVision,
  createVisionAssertions,
  formatSuiteResult,
} from '@codmir/test-framework';

// 1. Create a runner (connects to the agent server)
const runner = createRunner();

// 2. Set up vision for AI-powered assertions
const vision = createVision();
const va = createVisionAssertions(vision);

// 3. Define a test suite
const mySuite = suite('Homepage smoke test')
  .beforeAll(async (ctx) => {
    await ctx.agent.connect();
  })
  .afterAll(async (ctx) => {
    await ctx.agent.disconnect();
  })
  .test('Page loads correctly', (t) => {
    t.step('Open the browser', async (ctx) => {
      await ctx.agent.shell('open https://example.com');
      await ctx.agent.wait(2000);
    });

    t.step('Verify heading is visible', async (ctx) => {
      await va.assertVisible(ctx.agent, 'Example Domain heading');
    });

    t.step('Check for errors', async (ctx) => {
      await va.assertNoErrors(ctx.agent);
    });
  })
  .build();

// 4. Run and report
const result = await runner.runSuite(mySuite);
console.log(formatSuiteResult(result));
process.exit(result.failed > 0 ? 1 : 0);

Agent Client

The agent client communicates with the Codmir Desktop agent server over a local TCP socket. It reads the server port from ~/.codmir/agent-port by default.

import { createClient } from '@codmir/test-framework';

const agent = createClient({
  host: '127.0.0.1',      // default
  port: undefined,         // auto-read from port file
  portFile: '~/.codmir/agent-port', // default
  timeout: 10_000,         // connection timeout in ms
});

// Check the agent server is running
const alive = await agent.ping();         // boolean

// Get server status
const status = await agent.status();      // AgentResponse

// Session management
await agent.connect('my-session');        // connect with session name
await agent.disconnect();                 // disconnect

Screen methods

// Capture a screenshot (returns base64 string or null)
const base64 = await agent.screenshot();

// Get screen dimensions and frontmost app
const info = await agent.screenInfo();
// info: { ok: boolean, width: number, height: number, frontmost_app: string }

Input methods

// Mouse actions (x, y in screen pixels)
await agent.click(500, 300);
await agent.doubleClick(500, 300);
await agent.rightClick(500, 300);
await agent.move(500, 300);

// Scroll at position with delta
await agent.scroll(500, 300, -3);   // scroll up
await agent.scroll(500, 300, 3);    // scroll down

// Keyboard
await agent.type('Hello world');           // type text
await agent.key(36);                       // press key by keyCode
await agent.key(36, 256);                  // key with modifier flags

// Shell command
const res = await agent.shell('ls -la');   // AgentResponse

// Delay
await agent.wait(1000);                    // wait 1 second

AgentResponse

All action methods return an AgentResponse:

interface AgentResponse {
  ok: boolean;
  error?: string;
  [key: string]: unknown;   // additional fields per method
}

Test Builder

The builder API provides a fluent interface for defining test suites, test cases, and steps. Both suite() and test() return builder objects that are finalized with .build().

suite(name)

Creates a SuiteBuilder. Chain methods to add tests and lifecycle hooks, then call .build() to produce a TestSuite.

import { suite } from '@codmir/test-framework';

const mySuite = suite('Authentication flow')
  .beforeAll(async (ctx) => {
    // Runs once before all tests
    await ctx.agent.connect();
  })
  .afterAll(async (ctx) => {
    // Runs once after all tests
    await ctx.agent.disconnect();
  })
  .beforeEach(async (ctx) => {
    // Runs before each test case
  })
  .afterEach(async (ctx) => {
    // Runs after each test case
  })
  .test('Login with valid credentials', (t) => {
    t.step('Navigate to login page', async (ctx) => {
      await ctx.agent.shell('open https://app.example.com/login');
      await ctx.agent.wait(2000);
    });

    t.step('Enter email', async (ctx) => {
      await ctx.agent.click(400, 300);
      await ctx.agent.type('user@example.com');
    });

    t.step('Submit form', async (ctx) => {
      await ctx.agent.click(400, 400);
    });
  })
  .test('Login with invalid credentials', (t) => {
    t.step('Enter wrong password', async (ctx) => {
      // ...
    });
  })
  .build();

test(name)

Creates a standalone TestBuilder. Use .step() to add sequential steps and .timeout(ms) to set a per-test timeout. Add it to a suite with .addTest().

import { test, suite } from '@codmir/test-framework';

const loginTest = test('Login flow')
  .timeout(60_000)
  .step('Open app', async (ctx) => {
    await ctx.agent.shell('open https://app.example.com');
    await ctx.agent.wait(2000);
  })
  .step('Verify loaded', async (ctx) => {
    const info = await ctx.agent.screenInfo();
    // info.frontmost_app will be the browser name
  })
  .build();

// Add a pre-built test to a suite
const mySuite = suite('Smoke tests')
  .addTest(loginTest)
  .build();

TestContext

Every step function receives a TestContext object:

interface TestContext {
  agent: AgentClient;                   // the agent client instance
  screenshots: string[];                // collected screenshot base64 strings
  metadata: Record<string, unknown>;    // custom data shared between steps
}

Vision

The vision module sends screenshots to AI models (Gemini or Claude) for analysis. It automatically falls back to the second provider if the first fails.

createVision(options?)

import { createVision } from '@codmir/test-framework';

const vision = createVision({
  geminiApiKey: process.env.GEMINI_API_KEY,        // or set env var
  anthropicApiKey: process.env.ANTHROPIC_API_KEY,   // or set env var
  preferredProvider: 'gemini',                      // 'gemini' | 'claude'
  geminiModel: 'gemini-2.0-flash',                  // default
  claudeModel: 'claude-haiku-4-5-20251001',         // default
});

vision.describe(agent)

Takes a screenshot and returns a natural-language description of everything visible on screen: application name, UI elements, text, buttons, forms, and interface state.

const result = await vision.describe(agent);
// result: { answer: string, provider: string, model: string }

vision.findElement(agent, description)

Locates a UI element by natural-language description and returns its pixel coordinates.

const el = await vision.findElement(agent, 'the blue Submit button');
// el: { x: number, y: number, found: boolean, provider: string }

if (el.found) {
  await agent.click(el.x, el.y);
}

vision.isVisible(agent, description)

Checks whether a described element or text is visible on screen.

const { visible, provider } = await vision.isVisible(agent, 'Welcome back message');

vision.readText(agent, region?)

Extracts all visible text from the screen, organized by visual hierarchy. Optionally focus on a named region (e.g. "top navigation", "main content area").

const result = await vision.readText(agent, 'sidebar');
// result: { answer: string, provider: string, model: string }

vision.checkForErrors(agent)

Scans the screen for error states: error messages, red banners, 404/500 pages, broken layouts, crash screens.

const { hasErrors, errors, provider } = await vision.checkForErrors(agent);
// errors: string[] — descriptions of each error found

vision.compareScreenshots(before, after, description)

Compares two base64 screenshots and describes what changed, relative to a test description.

const before = await agent.screenshot();
// ... perform some action ...
const after = await agent.screenshot();

const diff = await vision.compareScreenshots(before!, after!, 'sidebar collapsed');
// diff: { answer: string, provider: string, model: string }

vision.ask(image, prompt) / vision.askFromScreen(agent, prompt)

Low-level methods for custom vision queries. ask accepts a base64 image directly; askFromScreen captures a fresh screenshot from the agent first.

const result = await vision.askFromScreen(
  agent,
  'How many items are in the shopping cart badge?'
);
console.log(result.answer);  // e.g. "3"

Vision Assertions

Vision assertions combine the vision module with assertion logic. They throw AssertionError on failure, which the runner catches and reports.

import { createVision, createVisionAssertions } from '@codmir/test-framework';

const vision = createVision();
const va = createVisionAssertions(vision);

va.assertVisible(agent, description)

Asserts that an element or text matching the description is visible on screen.

await va.assertVisible(ctx.agent, 'the navigation menu');

va.assertNotVisible(agent, description)

Asserts that an element or text is not visible on screen.

await va.assertNotVisible(ctx.agent, 'error dialog');

va.assertNoErrors(agent)

Uses vision to scan the screen for error states and fails if any are detected.

await va.assertNoErrors(ctx.agent);

va.assertTextOnScreen(agent, expectedText)

Reads all text from the screen and checks whether it contains the expected string (case-insensitive).

await va.assertTextOnScreen(ctx.agent, 'Welcome back, Nathan');

va.clickElement(agent, description)

Finds an element by description using vision and clicks it. Throws if the element is not found.

await va.clickElement(ctx.agent, 'the Sign In button');

va.doubleClickElement(agent, description)

Finds an element by description and double-clicks it.

await va.doubleClickElement(ctx.agent, 'the file name in the list');

va.typeInto(agent, description, text)

Finds a field by description, clicks it to focus, waits 200ms, then types the provided text.

await va.typeInto(ctx.agent, 'the email input field', 'user@example.com');

Recorder

The recorder captures browser actions and exports them as executable test scripts. Record a manual flow, then replay it as an automated test.

import { createRecorder } from '@codmir/test-framework';

const recorder = createRecorder();

// Start recording
recorder.start('Checkout flow');

// Record actions (typically from event hooks)
recorder.record({ type: 'click', params: { x: 400, y: 300 }, description: 'Add to cart button' });
recorder.record({ type: 'type', params: { text: 'test@example.com' }, description: 'Email field' });
recorder.record({ type: 'click', params: { x: 500, y: 600 }, description: 'Place order button' });
recorder.record({ type: 'screenshot', params: {} });

// Check recording state
recorder.isRecording();   // true

// Stop and get the session
const session = recorder.stop();
// session: RecordingSession | null

recorder.toTestScript(session, options?)

Converts a recorded session into a complete, runnable test script. Set useVision: true to generate vision-based steps that find elements by description instead of coordinates.

// Coordinate-based output (default)
const script = recorder.toTestScript(session!);

// Vision-based output (uses element descriptions)
const visionScript = recorder.toTestScript(session!, { useVision: true });

console.log(visionScript);

RecordedAction

interface RecordedAction {
  type: 'click' | 'double_click' | 'right_click' | 'type'
      | 'key' | 'scroll' | 'navigate' | 'wait' | 'screenshot';
  timestamp: number;
  params: Record<string, unknown>;
  description?: string;    // used by vision-based export
}

interface RecordingSession {
  name: string;
  startedAt: number;
  actions: RecordedAction[];
}

Assertions

Built-in assertion functions that work without vision models. These use the agent client directly for screen info, shell output, and basic checks.

assert(condition, message)

Basic assertion. Throws AssertionError if the condition is falsy.

import { assert } from '@codmir/test-framework';

assert(result.ok, 'Expected operation to succeed');

assertFrontmostApp(agent, expectedApp)

Verifies the frontmost application matches the expected name (case-insensitive substring match).

import { assertFrontmostApp } from '@codmir/test-framework';

await assertFrontmostApp(ctx.agent, 'Safari');
await assertFrontmostApp(ctx.agent, 'Chrome');

assertScreenSize(agent, minWidth, minHeight)

Verifies the screen meets minimum size requirements.

import { assertScreenSize } from '@codmir/test-framework';

await assertScreenSize(ctx.agent, 1280, 720);

assertShellOutput(agent, command, contains)

Runs a shell command and asserts the output contains the expected string.

import { assertShellOutput } from '@codmir/test-framework';

await assertShellOutput(ctx.agent, 'node --version', 'v20');

assertShellExitCode(agent, command, expectedCode?)

Runs a shell command and asserts its exit code. Defaults to expecting exit code 0 (success).

import { assertShellExitCode } from '@codmir/test-framework';

// Expect success
await assertShellExitCode(ctx.agent, 'echo hello');

// Expect failure
await assertShellExitCode(ctx.agent, 'false', 1);

waitFor(fn, options?)

Polls a condition function until it returns true or the timeout is reached. Useful for waiting on asynchronous UI changes.

import { waitFor } from '@codmir/test-framework';

await waitFor(
  async () => {
    const info = await ctx.agent.screenInfo();
    return info.frontmost_app.includes('Safari');
  },
  {
    timeout: 15_000,    // max wait time (default: 10_000)
    interval: 500,      // poll interval (default: 500)
    message: 'Safari did not become frontmost app',
  }
);

Runner

The runner executes test suites and individual tests, handling timeouts, lifecycle hooks, and screenshot-on-failure.

createRunner(options?)

Creates a runner with an embedded agent client. Accepts all TestFrameworkOptions.

import { createRunner } from '@codmir/test-framework';

const runner = createRunner({
  host: '127.0.0.1',
  timeout: 10_000,
  screenshotOnFailure: true,    // default: true
});

// Access the underlying agent client
runner.agent;

// Session management
await runner.connect();
await runner.disconnect();

// Run a single test case
const testResult = await runner.run(myTestCase);

// Run an entire suite
const suiteResult = await runner.runSuite(mySuite);

runTest(testCase, agent, options?)

Standalone function for running a single test with an existing agent client. Each test has a default timeout of 30 seconds. Steps run sequentially; execution stops at the first failed step.

import { runTest, createClient } from '@codmir/test-framework';

const agent = createClient();
const result = await runTest(myTestCase, agent, {
  screenshotOnFailure: true,
});
// result: TestResult

runSuite(testSuite, agent, options?)

Standalone function for running a full suite. Executes beforeAll once, then for each test: beforeEach, the test, afterEach. Finishes with afterAll.

import { runSuite, createClient } from '@codmir/test-framework';

const agent = createClient();
const result = await runSuite(mySuite, agent);
// result: SuiteResult

Result types

type TestStatus = 'passed' | 'failed' | 'skipped' | 'running';

interface StepResult {
  name: string;
  status: TestStatus;
  duration: number;        // milliseconds
  error?: string;
  screenshot?: string;     // base64, captured on failure
}

interface TestResult {
  name: string;
  status: TestStatus;
  duration: number;
  steps: StepResult[];
  error?: string;
}

interface SuiteResult {
  name: string;
  status: TestStatus;
  duration: number;
  tests: TestResult[];
  passed: number;
  failed: number;
  skipped: number;
}

Reporter

Formatting utilities for displaying test results in the terminal or converting them for programmatic use.

formatSuiteResult(result)

Returns a formatted multi-line string for terminal output, including per-test and per-step results with status icons and durations.

import { formatSuiteResult } from '@codmir/test-framework';

const output = formatSuiteResult(result);
console.log(output);

// Output:
// Homepage smoke test
// ===================
//   ✓ Page loads correctly (1523ms)
//       ✓ Open the browser (1205ms)
//       ✓ Verify heading is visible (318ms)
//
// Total:     2 tests | 2 passed | 0 failed | 0 skipped
// Duration:  1523ms
// Status:    PASSED

formatTestResult(result, indent?)

Formats a single test result with its steps. The optional indent parameter controls the leading whitespace (default: two spaces).

toJSON(result)

Serializes a SuiteResult or TestResult to a pretty-printed JSON string.

import { toJSON } from '@codmir/test-framework';

const json = toJSON(result);
fs.writeFileSync('test-results.json', json);

toSummary(result)

Returns a minimal summary object from a suite result, suitable for dashboards or CI reporting.

import { toSummary } from '@codmir/test-framework';

const summary = toSummary(result);
// { total: number, passed: number, failed: number,
//   skipped: number, duration: number, status: string }

Configuration

All configuration is passed via TestFrameworkOptions when creating a client or runner.

interface TestFrameworkOptions {
  host?: string;                // Agent server host (default: '127.0.0.1')
  port?: number;                // Agent server port (auto-read from port file if omitted)
  portFile?: string;            // Path to port file (default: '~/.codmir/agent-port')
  timeout?: number;             // Connection timeout in ms (default: 10_000)
  screenshotOnFailure?: boolean; // Capture screenshot on step failure (default: true)
  provider?: string;            // Provider identifier sent to agent (default: 'codmir-test')
}

Environment variables

GEMINI_API_KEYGoogle Gemini API key for vision analysis

ANTHROPIC_API_KEYAnthropic Claude API key for vision analysis

At least one vision API key must be set for the vision and vision assertion modules. The agent client and basic assertions work without any API keys.

Provider fallback

When both API keys are configured, the vision module tries the preferred provider first. If it fails (network error, rate limit, invalid key), it automatically retries with the second provider. Set preferredProvider in createVision() to control the order. Default preference is Gemini.

See the SDK overview for other Codmir packages, or the guides for more examples.