AI in software testing: between promise and reality
The rise of artificial intelligence (AI) raised high expectations in the world of software testing. Self-learning test suites, automatically generated test cases and predictive quality models sounded promising. But now it is time for a realistic balance sheet. For although AI is making progress in many areas, the actual impact on the testing profession is proving to be less than expected.
The initial promise of AI within software testing sounded enticing. Machines that generate their own tests based on requirements or user behavior. Algorithms that optimize regression tests based on risk. AI that makes maintenance of test automation largely unnecessary. The expectations: less repetitive work, faster feedback and better risk management. Unfortunately, a fundamental shift in software testing failed to materialize. Where test automation often struggles with maintenance and stability, AI rarely offers a structural solution.
Most AI applications in software testing focus heavily on narrow domains:
- Visual regression via image comparison (as with Applitools or Percy): useful, but an evolution rather than a revolution.
- Test prioritization and flaky test detection within CI/CD: valuable in the chain, but little impact on test strategy.
- Automatic updates of locators in UI test automation: seems convenient, but can mask regression.
Why is the impact on software testing failing?
A major cause is that testing is more than verification. It is context-driven quality thinking. Testing is at the intersection of technology, business logic and risk assessment. AI lacks the necessary domain understanding for proper implementation. This leaves the gap between promise and practice.
Currently, AI mainly provides operational support in specific sub-areas, such as analysis of large amounts of logging and monitoring data and through smart dashboards that visualize trends in defects or performance. Natural language processing by LLMs helps with requirements analysis, test case creation, structuring and preparing test reports and generating test automation. But in this context, AI acts as an assistant, not a replacement for the tester. It increases efficiency, but should not take over responsibility. It is important that humans always remain involved to interpret and adjust results.
The real impact of AI is not in automating the testing profession, but in testing AI itself. As systems more frequently incorporate AI, from LLMs to predictive models, the need for new forms of testing grows. Traditionally, software testing assumes predictable logic, while AI models learn patterns that are often not insightful or repeatable. This calls for new forms of testing that require collaboration with disciplines such as data science, compliance and ethics.
Some examples of new test forms that can be used in AI testing are:
- Fairness testing: validating whether a model treats different groups legally and socially fairly.
- Bias detection: explicit technical analysis of unintended biases in model behavior.
- Explainability assessments: checking whether an end user or supervisor can be explained how an AI model works. Consider why the model made a particular decision, what input factors determined it, and the extent to which the explanation is reliable and repeatable.
- Robustness testing: seeing how a model responds to incorrect, incomplete or misleading input.
AI has not made the testing profession obsolete, but it does force reflection and innovation. The hype is certainly there, but effective autonomous test agents are not. Instead of waiting for a breakthrough, it is time to approach AI pragmatically: as a tool, not a panacea. Testers remain indispensable for the interpretation, validation and ethical assessment of software quality, especially as that software gets smarter and smarter and developers increasingly use AI. Make use of supporting tools, but control the outcomes and stay at the helm yourself. Testers who learn to test AI systems now will soon be indispensable in both innovation and compliance.
