UI automation testing is often difficult to maintain, which often involves a maze of #ids, data-test attributes, and .selectors. When it comes to refactoring, it can be a nightmare, although this is precisely the situation where UI automation should be useful.
Introducing Midscene.js, an innovative SDK designed to bring joy back to automation scripts by simplifying the commands.
Midscene.js leverages a multimodal Large Language Model (LLM) to intuitively “understand” your user interface and carry out the necessary actions. You can simply describe the interaction steps or expected data formats, and the AI will handle the execution for you.
There are three main capabilities: action (.ai, .aiAction), query (.aiQuery), assert(.aiAssert).
.ai to execute a series of actions by describing the steps.aiQuery to extract customized data from the UI. Just describe the JSON format you want, and AI will give the answer based on its "understand" of the page.aiAssert to perform assertions on the page.All these methods accept natural language prompt as param. Obviously, the cost of script maintenance will be greatly decreased.
For example
To start experiencing the core feature of Midscene, we recommend you use The Chrome Extension. You can call Action / Query / Assert by natural language on any webpage, without needing to set up a code project.
Also, there are several ways to integrate Midscene into your code project:
Midscene will provide a visual report after each run. With this report, you can review the animated replay and view the details of each step in the process. What's more, there is a playground in the report file for you to adjust your prompt without re-running all your scripts.


Midscene.js is an open-source project (GitHub: Midscene) under the MIT license. You can run it in your own environment. All data gathered from pages will be sent directly to OpenAI or the custom model provider according to your configuration. Therefore, only you and the model provider will have access to the data. No third-party platform will access the data.
Currently, the model we are using by default is the OpenAI GPT-4o model, while you can customize it to a different multimodal model if needed.
To quickly experience the main features of Midscene, you can use the Chrome Extension. It allows you to use Midscene on any webpage without writing any code.