If you run a real operation, the hardest part of evaluating an AI partner is not figuring out which vendor has the best demo. The hardest part is figuring out which vendor will still be useful eighteen months from now, when the demo is forgotten and the system is supposed to be quietly doing your work. This guide is what I would tell a friend who asked me how to make that decision.
I am going to write this as if you are sitting across from me at a coffee shop. I am not going to pretend the questions are easy or that the answers are clean. Most of the operators I talk to have already been pitched by four or five AI vendors before we ever speak. They are tired. They are skeptical. They have watched a colleague spend real money on something that did not pan out. They are not looking for another deck. They are looking for a way to make a defensible decision and move on with their life.
Here is how I would do it.
Start with the job, not the tool
The first mistake most buyers make is starting with the technology. They walk into the conversation already thinking about AI assistants, voice agents, or automation platforms. The right starting point is the job. What is the specific piece of work you want this AI to handle? Not the goal. Not the metric. The actual unit of work.
For a dental practice owner, that might be inbound new-patient calls during business hours. For a real estate team, it might be lead intake from listing pages and the first seven days of follow-up. For a commercial real estate analyst, it might be the first pass on a retail underwriting that today takes a junior most of a day. The job has to be concrete enough that you and the vendor can both point at it and agree on what counts as done.
If a vendor cannot or will not name the unit of work clearly, that is a red flag. It usually means the vendor is selling a platform, not solving a problem, and you are going to spend the engagement figuring out which subset of their platform applies to you. That is your work, not theirs.
Ask who has done your exact job before
The most useful question I have ever heard a buyer ask is: have you done this exact piece of work, for someone in my industry, in production. Not a demo. Not a pilot. Not a similar use case in a tangential industry. The exact job.
If the answer is yes, the next question is whether you can talk to the customer who has it running. Real reference calls are the single highest-signal piece of due diligence available to you, and they are cheap. A real reference call will tell you how the vendor behaved when something broke, how the integration actually feels in daily use, and how the vendor sized the work compared to how it actually played out.
If the answer is no, you are the first customer for this specific job. That is not necessarily disqualifying. Sometimes you have to be the first customer. But you should know it, price it into your expectations, and structure the engagement so the vendor is investing as much as you are in making it work.
Watch what the vendor talks about
You can learn a lot about a vendor's depth by what they spontaneously talk about. The vendors I trust the most spend most of the conversation on edge cases. The ones I trust the least spend most of the conversation on capabilities.
A vendor who has actually shipped will, within the first half hour of a real conversation, start telling you about failure modes. The thing that goes wrong when a caller has a thick accent. The thing that happens when the integration partner has an outage. The way the system handles an angry customer. The escalation paths that they have learned the hard way to design in from day one.
A vendor who is selling vaporware will not have these stories, because they have not seen these situations. They will instead talk about everything the system can do, in the abstract, with confident hand-waving. The presence of edge case stories is one of the cleanest indicators of whether the vendor has done the work.
The vendors I trust most spend most of the conversation on edge cases. The ones I trust least spend most of the conversation on capabilities.
Insist on integration specifics
Almost every operational AI use case lives or dies on integration. The AI gives an answer; integration makes the answer useful. If the agent can book an appointment but cannot actually write it into the scheduling system you use today, the agent is interesting but not deployable.
Ask the vendor to be specific about how the system will read from and write to the tools you already run. Not whether it can integrate. The how. Will it use a native API. Will it screen-scrape a portal. Will it run a sync on a delay or in real time. What happens if the integration partner changes their endpoint. Who owns the maintenance of that integration over time.
The vendors who have done this work will answer the question precisely. The vendors who have not will give you a marketing answer about being integration-friendly, and you will discover six weeks into the engagement that the integration is not, in fact, written, and someone has to write it now.
Velzyx documents what operators get on the how it works page. Buyers benefit from knowing the depth of work involved before the conversation starts.
Look at the team page
The org chart of an AI vendor tells you what kind of business you are about to hire. If the team page is heavy on senior engineers with serious technical backgrounds, you are talking to a team that ships software. If the team page is heavy on growth marketers, account executives, and customer success managers, you are talking to a team that ships services and configuration on top of someone else's product. Neither is wrong. Just know which one matches your job.
If you cannot find the team page at all, or it lists only first names and roles without any verifiable history, treat that as a red flag worth investigating. Real engineering teams are usually willing to show their work. Vendors hiding their team almost always have a reason.
Push hard on operations after launch
Most buyers spend ninety percent of the diligence on the build and ten percent on operations. The split should be the opposite. The build is a one-time event. Operations are forever.
Ask the vendor what happens in week one after the system goes live. Who is watching the logs. Who calls you if something breaks. What the response time looks like. Whether you have a direct line to a human or whether you are submitting tickets into a queue. What the vendor's process is for finding problems before you do.
Then ask the same questions about month six. Most AI systems work fine in week one. The interesting question is whether they will still work fine after the first integration partner has changed their API, the first caller has thrown the model a curveball it has never seen, and the first edge case has surfaced that the original design did not anticipate. The vendor's answer to "what does month six look like" tells you whether they have run a real system before, or whether they think operations are something you do when there is time.
Be suspicious of platform pitches
If a vendor pitches you a platform that does five things, ask which of those five things they have actually deployed in your industry. Usually the answer is one or two. The rest are roadmap items dressed up as features. The risk in buying the platform pitch is that you are paying for breadth you do not need, and the depth you do need is the part the vendor has not yet built.
The cleanest version of this red flag is the vendor who pitches the same platform to a dental practice, a law firm, and a logistics company. The pitch is similar because the platform is the same. The depth in each industry is, by definition, shallow. If your operation needs depth, the platform vendor is not the right shape. You need someone who has gone deep in your industry specifically.
You can read more about why we built three vertical products at Velzyx instead of one horizontal one on the Aria page.
Watch the contract
The contract is where the vendor's real intentions show up. Pay attention to a few things.
Ownership of the system. Who owns the configuration, the instruction layer, the integrations, the data flows after the engagement ends. If the answer is the vendor, you are renting, not buying, and the vendor's leverage will compound over time.
Termination. What happens if you want to leave. Can you take the system with you in a usable form. Can you take your data. Are you going to be locked into a long renewal because the off-ramp is painful.
Liability. What happens if the AI does something costly or embarrassing. Most vendors disclaim everything, which is reasonable, but watch for clauses that put unusual risk back on you for the vendor's mistakes.
Change management. What happens when you want to change the system after launch. Who pays for changes. Are minor adjustments included in operations, or is everything a new statement of work.
You do not have to win every clause. You just have to understand them. A vendor who refuses to discuss any of these is telling you something about how they expect the relationship to go.
The reference call template
If you do exactly one piece of additional diligence beyond what the vendor offers, do this. Ask the vendor for two references. Call both. Do not send them an email. Get them on the phone for twenty minutes. Ask the following five questions.
How did the engagement actually compare to what was sold. What broke that you did not expect. How responsive is the vendor when something is wrong. Would you hire them again for a different job. What would you have done differently knowing what you know now.
The answers to these questions are worth more than any vendor's case study. The signal in a real conversation with a real operator who has been through the engagement is the most honest information you will get during diligence. If a vendor cannot or will not provide references, that is itself a signal. Real engagements produce real customers who will pick up the phone.
Trust your gut on the founder
For most operational AI work, especially with smaller vendors, you are not just hiring a company. You are hiring the person across the table from you. The founder of a studio is often involved enough in delivery that their judgment, taste, and integrity matter to your outcome. Some of this is unmeasurable. You will know it when you see it, and you will know its absence faster.
Pay attention to whether they answer hard questions directly or pivot away from them. Pay attention to whether they will tell you the system cannot do something, even when it would be easier to imply that it can. Pay attention to whether the conversation feels like sales or like engineering. The engineering conversations feel different. They are slower, more specific, and more interested in your problem than in the vendor's product.
If the founder is not willing to have one of those conversations before you sign, the vendor's culture is probably set up around closing deals rather than shipping systems. That is a fine business. It is just not the kind of partner you want for a critical workflow.
One worked example
For the dental practice owners reading this, here is what a real reference looks like in our case. We deployed Aria at WizKids Dental in Costa Mesa. The team there will pick up the phone and tell you, in their own words, what the engagement was like, what worked the first week, what we adjusted in week three, and how the system behaves now. You can read the published version of that engagement at WizKids Dental case study, and we will introduce you to the team there directly if you ask.
I mention this not as a pitch. I mention it because the only way to evaluate an operational AI vendor is to talk to the people whose operations they have actually touched. Anyone unwilling to provide that conversation is asking you to take their word, and you should not.
A final test
If you take only one thing from this essay, take this. Before you sign anything, ask the vendor to describe, in plain language, where their system fails and what happens when it does. The vendors who can do this clearly have built something real. The ones who flinch have not. The test is fast, free, and has saved every operator I know at least one bad contract.
The AI market in 2026 is full of capable people doing real work and also full of capable people selling air. The structural differences between the good vendors and the bad ones are visible if you know where to look. The list above is the one I would use if I were on your side of the table. I hope it helps.
You can read more about how we approach engagements on the engagement page if you want to compare it to what other vendors are offering.
If you are evaluating us
Run the test above on Velzyx the same way you would on any vendor. Ask the hard questions. Talk to our references. Then decide whether we fit.
Talk to Varinder