Selecting the right UI elements is crucial for creating reliable automation workflows. This guide provides strategies to improve accuracy and performance when interacting with UI elements.

Natural Language Element Selection

AskUI’s Vision Agent uses natural language prompts to identify and interact with UI elements on your screen. This approach offers several advantages:

  • Intuitive Descriptions: Use everyday language to describe elements like “login button” or “search field”
  • Contextual Understanding: The agent understands elements based on their visual appearance and surrounding context
  • Flexibility: No need to learn complex selector syntax or DOM structures

How Natural Language Selection Works

When you provide a natural language prompt like “login button”, the Vision Agent:

  1. Captures the current screen state
  2. Analyzes the visual elements present
  3. Identifies the element that best matches your description
  4. Performs the requested action on that element

Example: Calculator Automation

from askui import VisionAgent

with VisionAgent() as agent:
    # Open Chrome and navigate to calculator
    agent.tools.webbrowser.open_new("https://askui.github.io/askui-practice-page/")

    # Click numbers and operators
    agent.click("7")
    agent.click("+")
    agent.click("7")
    agent.click("=")

    # Extract the result
    result = agent.get("What is the calculation result?")
    print(f"The result is: {result}")

AI Elements

AI Elements allow you to capture and use specific visual elements from your screen for reliable automation.

Enable the AskUI Development Environment as described in the installation guide and activate experimental commands by running AskUI-ImportExperimentalCommands in your terminal.

AskUI-NewAIElement

This command captures elements from your screen for use in AskUI workflows.

It then can be used by setting the model config to askui-ai-element, like:

agent.click("ai-element-name", model_name="askui-ai-element")

Parameters:

  • Name (Optional): The name for the screenshot file. If defined, indicates only one element is being captured. For multiple elements, you’ll provide names individually.
  • WorkspaceId (Optional): Uses the Workspace ID from settings by default. Required if not in settings.
  • AlwaysPreview (Optional): Automatically opens preview without prompting. Cannot be used with NoPreview.
  • NoPreview (Optional): Skips the preview. Cannot be used with AlwaysPreview.
  • OneShot (Optional): Ends snipping after first successful AI Element creation.
  • Annotate (Optional): Takes a fullscreen capture for region annotation.

Example Output:

By implementing these techniques, you’ll build more reliable, flexible, and accurate automation workflows that can handle complex UI interactions with confidence.