askui.agent

VisionAgent Objects

class VisionAgent()

click

def click(instruction: Optional[str] = None,

          button: Literal['left', 'middle', 'right'] = 'left',

          repeat: int = 1,

          model_name: Optional[str] = None) -> None

Simulates a mouse click on the user interface element identified by the provided instruction.

Arguments:

  • instruction str | None - The identifier or description of the element to click.
  • button ‘left’ | ‘middle’ | ‘right’ - Specifies which mouse button to click. Defaults to ‘left’.
  • repeat int - The number of times to click. Must be greater than 0. Defaults to 1.
  • model_name str | None - The model name to be used for element detection. Optional.

Raises:

  • InvalidParameterError - If the ‘repeat’ parameter is less than 1.

Example:

with VisionAgent() as agent:

    agent.click()              # Left click on current position

    agent.click("Edit")        # Left click on text "Edit"

    agent.click("Edit", button="right")  # Right click on text "Edit"

    agent.click(repeat=2)      # Double left click on current position

    agent.click("Edit", button="middle", repeat=4)   # 4x middle click on text "Edit"

mouse_move

def mouse_move(instruction: str, model_name: Optional[str] = None) -> None

Moves the mouse cursor to the UI element identified by the provided instruction.

Arguments:

  • instruction str - The identifier or description of the element to move to.
  • model_name str | None - The model name to be used for element detection. Optional.

Example:

with VisionAgent() as agent:

    agent.mouse_move("Submit button")  # Moves cursor to submit button

    agent.mouse_move("Close")  # Moves cursor to close element

    agent.mouse_move("Profile picture", model_name="custom_model")  # Uses specific model

mouse_scroll

def mouse_scroll(x: int, y: int) -> None

Simulates scrolling the mouse wheel by the specified horizontal and vertical amounts.

Arguments:

  • x int - The horizontal scroll amount. Positive values typically scroll right, negative values scroll left.
  • y int - The vertical scroll amount. Positive values typically scroll down, negative values scroll up.

Notes:

The actual scroll direction depends on the operating system’s configuration. Some systems may have “natural scrolling” enabled, which reverses the traditional direction.

The meaning of scroll units varies across operating systems and applications. A scroll value of 10 might result in different distances depending on the application and system settings.

Example:

with VisionAgent() as agent:

    agent.mouse_scroll(0, 10)  # Usually scrolls down 10 units

    agent.mouse_scroll(0, -5)  # Usually scrolls up 5 units

    agent.mouse_scroll(3, 0)   # Usually scrolls right 3 units

type

def type(text: str) -> None

Types the specified text as if it were entered on a keyboard.

Arguments:

  • text str - The text to be typed.

Example:

with VisionAgent() as agent:

    agent.type("Hello, world!")  # Types "Hello, world!"

    agent.type("user@example.com")  # Types an email address

    agent.type("password123")  # Types a password

get

def get(instruction: str, model_name: Optional[str] = None) -> str

Retrieves text or information from the screen based on the provided instruction.

Arguments:

  • instruction str - The instruction describing what information to retrieve.
  • model_name str | None - The model name to be used for information extraction. Optional.

Returns:

  • str - The extracted text or information.

Example:

with VisionAgent() as agent:

    price = agent.get("What is the price displayed?")

    username = agent.get("What is the username shown in the profile?")

    error_message = agent.get("What does the error message say?")

wait

@validate_call

def wait(sec: Annotated[float, Field(gt=0)])

Pauses the execution of the program for the specified number of seconds.

Arguments:

  • sec float - The number of seconds to wait. Must be greater than 0.

Raises:

  • ValueError - If the provided sec is negative.

Example:

with VisionAgent() as agent:

    agent.wait(5)  # Pauses execution for 5 seconds

    agent.wait(0.5)  # Pauses execution for 500 milliseconds

key_up

def key_up(key: PC_AND_MODIFIER_KEY)

Simulates the release of a key.

Arguments:

  • key PC_AND_MODIFIER_KEY - The key to be released.

Example:

with VisionAgent() as agent:

    agent.key_up('a')  # Release the 'a' key

    agent.key_up('shift')  # Release the 'Shift' key

key_down

def key_down(key: PC_AND_MODIFIER_KEY)

Simulates the pressing of a key.

Arguments:

  • key PC_AND_MODIFIER_KEY - The key to be pressed.

Example:

with VisionAgent() as agent:

    agent.key_down('a')  # Press the 'a' key

    agent.key_down('shift')  # Press the 'Shift' key

act

def act(goal: str, model_name: Optional[str] = None) -> None

Instructs the agent to achieve a specified goal through autonomous actions.

The agent will analyze the screen, determine necessary steps, and perform actions to accomplish the goal. This may include clicking, typing, scrolling, and other interface interactions.

Arguments:

  • goal str - A description of what the agent should achieve.
  • model_name str | None - The specific model to use for vision analysis. If None, uses the default model.

Example:

with VisionAgent() as agent:

    agent.act("Open the settings menu")

    agent.act("Search for 'printer' in the search box")

    agent.act("Log in with username 'admin' and password '1234'")

keyboard

def keyboard(key: PC_AND_MODIFIER_KEY,

             modifier_keys: list[MODIFIER_KEY] | None = None) -> None

Simulates pressing a key or key combination on the keyboard.

Arguments:

  • key PC_AND_MODIFIER_KEY - The main key to press. This can be a letter, number, special character, or function key.
  • modifier_keys list[MODIFIER_KEY] | None - Optional list of modifier keys to press along with the main key. Common modifier keys include ‘ctrl’, ‘alt’, ‘shift’.

Example:

with VisionAgent() as agent:

    agent.keyboard('a')  # Press 'a' key

    agent.keyboard('enter')  # Press 'Enter' key

    agent.keyboard('v', ['control'])  # Press Ctrl+V (paste)

    agent.keyboard('s', ['control', 'shift'])  # Press Ctrl+Shift+S

cli

def cli(command: str) -> None

Executes a command on the command line interface.

This method allows running shell commands directly from the agent. The command is split on spaces and executed as a subprocess.

Arguments:

  • command str - The command to execute on the command line.

Example:

with VisionAgent() as agent:

    agent.cli("echo Hello World")  # Prints "Hello World"

    agent.cli("ls -la")  # Lists files in current directory with details

    agent.cli("python --version")  # Displays Python version