AI agent and generative artificial intelligence concept. Businessman using AI agents on screen, including chatbots, AI assistants, and data analytics tools on a laptop.

By Juras Juršėnas

Agentic AI is the newest AI development creating waves for developers. The potential this technology presents in terms of how organisations operate, scale, and optimise is limitless. However, the effect computer-using agents will have on the current SaaS market is still an open question. Do businesses have what they need to build systems that deliver on what customers are asking of them?

Artificial Intelligence (AI) has produced many strains of technology that are already changing how we do things at home and work. The changes expected in the near future are even more revolutionary. Agentic AI technology that allows AI tools to autonomously determine the best paths to perform tasks with minimal human oversight is the flagship of this next wave of changes.

One of the most dynamic ways agentic AI can be implemented is by creating and launching computer-using agents (CUAs). The expectations for these tools go as far as completely transforming how we interact with computers. Thus, naturally, AI companies race to position themselves as the best CUA developers while software-as-a-service (SaaS) providers pay close attention.

What are computer-using agents?

Computer-using agents (CUAs) are AI-driven systems that interact with software as a user would. These agents can navigate the user interface by pressing buttons, inputting information, and analyzing responses. Thus, they can autonomously execute operations on behalf of humans through the complex interplay between AI vision, machine learning, and data processing. As such, CUAs are a subcategory of Agentic AI systems, specializing in using existing software interfaces the way a human would.

After recent releases by the big players, such as OpenAI’s Operator and Microsoft’s Azure AI Foundry, CUAs are attracting increasingly more attention. They are expected to eventually disrupt the SaaS market. How warranted is this expectation?

CUAs – a one-size-fits-all AI system for optimization?

The words accompanying Operator’s launch make the case for CUA technology on the whole:

CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen—just as humans do. This gives it the flexibility to perform digital tasks without using OS- or web-specific API’s.”

It is important to note their capacity to make decisions based on rules or learned behaviours and adapt to changing environments. Thus, their application horizons are truly huge, as seen in the examples below.

Network Security. CUAs could automate various security-related IT processes, such as updating and securely storing passwords and implementing other updates across the network. Additionally, they can manage user access and automatically handle many network security incidents while constantly logging all incident data.

Back Office Tasks. Various back office tasks involve interacting with software. AI agents can simplify engagement with CRMs and various other systems and databases based on CRUD (create, read, update, delete) operations. By automating these operations, CUAs can streamline the achievement of users’ goals while saving time and costs.

Financial Services. Use cases here are very wide, covering everything from transaction processing to customer verification and fraud detection. CUAs could also help streamline compliance checks and reduce operational risks while ensuring that all regulatory requirements are consistently met.

Web Scraping. CUAs can be integrated into web scraping platforms to fill in scraping parameters, navigate websites, and find the necessary information to extract. Dedicating these tasks to AI agents opens web scraping to more professionals beyond specifically trained developers. Meanwhile, the latter have more time to work on complex tasks and innovative solutions.

Furthermore, CUAs are built on constantly improving large language models (LLMs), allowing them to determine the best next move in a particular context. Thus, we can expect them to evolve rapidly when tested in real-life applications.

Building the next disruptor

The disruptive potential of CUAs is clear. Microsoft’s CEO himself sparked a debate by suggesting that AI agents will make SaaS as we know it obsolete by taking over CRUD tasks and moving seamlessly between databases and apps.

Other experts are more restrained. They point out that CUAs are more likely to transform SaaS rather than completely replace it, and that the scale of this transformation is still up in the air.

The answer to this question depends on the quality of the CUAs that the developers produce next. We can trace how CUAs function to better understand what developers must do to unleash the technology’s disruptive power. Additionally, this provides insight into how companies will try to advance against competitors.

In a nutshell, CUAs work by analyzing screen pixels to understand what’s being displayed and using virtual mouse and keyboard controls to execute actions. More generally, the agents must be built to excel at three iterative processes.

Information intake: CUAs take screenshots of the digital interface in order to understand the environment they’re operating in. They utilize computer vision and, by examining GUI screenshots, recognize the interface elements they need to complete tasks. Additionally, CUAs can extract and interpret text from screenshots. Web browser-using agents don’t even have to make the screenshots, as they can just analyze the HTML to understand how to navigate the website.

In order to interpret complex user requests, CUAs need high natural language processing (NLP) capabilities. The best CUAs should also be able to interpret multimodal requests that include text, imagery, audio, and other input types. Training models to have these abilities requires a lot of multimodal data. Thus, the race to build the next big disruptor is truly about the companies’ capability to efficiently extract and use such data for training.

Reasoning: Another crucial competition area for AI companies is training the best reasoning capabilities. Once the visual information has been processed, CUAs use step-by-step reasoning to determine the best course of action. They monitor progress through multiple stages and adjust their approach whenever the interface changes. Once again, the quality of training data will determine the robustness and utility of the upcoming tools.

Action: Ultimately, CUAs perform tasks using virtual mouse and keyboard inputs. They can conduct various actions, including entering text, selecting buttons, and navigating content. The most useful and flexible agents will seamlessly integrate with various APIs and external systems to perform complex tasks. Developing such agents will require innovative solutions to numerous challenges of integrating AI with legacy systems.

Thus, the rollout of these AI agents will once again showcase the human ingenuity and talent that various companies have at their disposal.

The takeaway

Computer-using agents are just one of the many AI innovations that are changing the face of modern business. The potential they present in terms of how businesses operate, scale, and optimise is limitless. The effect these agents will have on the current SaaS market is still an open question. Its answer depends on how well companies can extract and leverage multimodal data and other resources to build the next generation of agents. This race will also play a part in determining the power dynamics between the top players in AI development.

About the Author

Juras JuršėnasWith over 16 years of experience in the IT field, Juras Juršėnas has established himself as an expert in SaaS product management and large-scale IT business operations. His ability to apply strategic problem-solving, critical thinking, and people management skills led him to become the COO at Oxylabs, a global web intelligence collection platform.