How to Actually Measure Supervision Fidelity in an ABA Clinic

Most BCBAs give feedback but never track it. Here is how to measure supervision fidelity without burning out, from a BCBA-led CEU.

Key takeaway

Supervision fidelity is the measurement layer most ABA clinics skip. The Colin and colleagues 2024 procedural integrity survey put hard numbers on it.

Watch the full CEU recording

Supervision Articles Deep Dive

Matt Harrington · 197 min

Watch on openceu.com →

On this page · 8 sections▾

Supervision fidelity is the measurement layer most ABA clinics skip. The Colin and colleagues 2024 procedural integrity survey put hard numbers on it. Ninety percent of BCBAs (board certified behavior analysts, the clinicians who design and supervise behavior plans) said they give feedback to their RBTs (registered behavior technicians, the staff running the daily sessions) on a regular basis. Only 20% of those BCBAs track that feedback with any kind of graph or running count. That is the 90/20 gap. Feedback happens. Measurement does not. So nobody really knows if the feedback worked. Fidelity, in this context, just means the staff member ran the plan the same way every time. This page walks through how to close the 90/20 gap with a workflow that fits on your phone. Apple Notes, one note per client, two minutes a session. That is the version that finally stuck for me after every fancy system failed.

What supervision fidelity actually means (and what it is not)#

Supervision fidelity is one question with a yes or no answer. Did the RBT run the program the way the plan says to run it? That is the whole thing. If the plan says deliver the SD (the spoken instruction, like "touch red"), wait three seconds, then prompt, fidelity is whether that order happened. Not how nice the session looked. Not how much the kid liked it. Just the steps, in order, with the right timing.

Two words get used for this. Procedural integrity. Treatment integrity. In the research, they mean the same thing most of the time. Use whichever one your clinic uses. The point is the same.

Fidelity is not the same as feedback. Feedback is what you say after you watch. Fidelity is what you measure while you watch. You can give great feedback with zero fidelity data. You can also collect great fidelity data and never give feedback. Both happen all the time. The job is to do both, and to write down what you saw so you can tell if next week is better than this week.

The 90/20 problem: why most BCBAs give feedback but never measure it#

The Colin and colleagues 2024 paper surveyed practicing BCBAs about how they run supervision in the field. The numbers were honest and a little painful. Here is the one that matters most:

94% observe staff regularly, weekly or monthly. Feedback was interesting. 90% provided feedback consistently. Almost all of it was in-the-moment vocal feedback. But only 20% used graphs to track their feedback. From the talk — Matt Harrington

Read that one more time. Almost everyone is showing up. Almost everyone is talking. One in five is writing anything down. The other four are running on memory. Memory is not a measurement system. You forget what you said last week. You forget if the error pattern is getting better or worse. You walk into the next session and start the loop over.

The 90/20 gap is not a motivation problem. BCBAs care. They observe. They speak up. The gap is a system problem. Nobody handed them a tracking workflow that takes less than two minutes. So the tracking falls off and the feedback turns into a Groundhog Day loop where you say the same correction six weeks in a row without ever knowing it.

Why insurance and the BACB do not require fidelity data (and why that matters)#

Here is a part the field does not talk about much. There is no rule that says you have to collect fidelity data on your RBTs. The BACB (Behavior Analyst Certification Board) does not require it. Your funding source almost never requires it. The research world treats it as a hard line. Most clinical work treats it as optional.

Procedural integrity in research has become extremely common. There's not really a requirement for clinicians to have any type of procedural integrity data. I've never seen it required from an insurance point of view. From the talk — Matt Harrington

That gap matters because it tells you why the 90/20 split exists. If nobody outside your clinic is asking for the number, the number does not get collected. Not because the BCBA is lazy. Because the day already has more in it than the day will hold. The thing nobody asks for is the thing that falls off first.

There is also an upstream piece. A 2023 BST training study found that fewer than half of practicing BCBAs were taught how to give feedback using BST (behavioral skills training, the four-step model of instruct, model, rehearse, and feedback) in the first place. So even when a BCBA wants to track fidelity, the feedback layer was self-taught. The 90/20 gap sits on top of a 50/50 training gap. Knowing that helps you stop blaming yourself for a system you never got trained on.

The takeaway is not "wait for the BACB to require it." The takeaway is "if I want to know what is actually working in my clinic, I have to build the measurement layer myself." Nobody is coming.

A two-minute fidelity check you can run today#

The biggest reason fidelity tracking dies is the tool. People try to build a 30-item checklist in a Google Sheet, fill it out twice, and quit. The check that lasts is short and lives where you already are.

Here is one that fits on a sticky note. Pick three to five steps of one program the RBT is running this session. Write them in order. Sit and watch ten trials. For each trial, mark yes or no for each step. That is it. The math is yes count divided by total steps watched. If you watched ten trials with five steps, you watched 50 steps. If 42 were yes, fidelity is 84%.

You do not need every program. You need the program that is moving slowest, or the program where data look weird. Pick that one. Watch ten trials. Write the number. Move on. Two minutes if you are quick. Five if you are talking through it with the RBT after.

The trick is not to grade the whole session. The trick is to grade ten trials of one program. The whole session is too big. Ten trials of one program is small enough to actually do every visit.

Tracking fidelity in Apple Notes, a spreadsheet, or whatever works for you#

The system you use does not matter as much as whether you actually use it. I have tried five fancy ones. The one that stuck was the dumbest.

The only time I've been able to really get good integrity data over and over again, consistent basis is when I stop trying to make some other system work for me. For me, that was Apple Notes. Every client had their own little note. Pop open that note. I take my quick data and then I move on. From the talk — Matt Harrington

One note per client. Open it on your phone. Type the date, the program you watched, the yes count over total. Done. The note stays in order without you doing anything. You can search it later by client name. It syncs to your laptop. No login. No new tab. No nag screen.

If you are on Android, the same thing works in Google Keep. If your clinic blocks personal apps, a one-tab Google Sheet with three columns (date, program, fidelity percent) works the same way. The rule is the same. One place per RBT or per client. Three fields. Always the same three fields.

What kills tracking is variety. The day you start adding fields ("notes," "next step," "mood"), the system gets heavy and you stop using it. Resist. Three fields. The narrative goes in your session note. The number goes in the tracker. That separation is what keeps the tracker alive.

How to use the data without drowning in it#

Two minutes of data a week per RBT will pile up fast. The mistake is staring at every number. You do not need to. You need three things from the tracker.

First, the trend per RBT. Open the note. Skim down. Is the percent going up over four weeks, holding, or going down? That is the only question for that scan. Up is good. Holding at 90% or higher is fine. Holding below 80% means the feedback is not landing and you need to change tactics. BST again. Role play. Smaller chunks.

Second, the trend per program. If three RBTs are all below 80% on the same program, the program is the problem, not the staff. The plan is unclear. The SD is awkward. The prompt hierarchy is fuzzy. Fix the plan before you give more feedback to the people.

Third, the cases that are flat. A child whose graph has not moved in eight weeks is almost always a fidelity case in disguise. Pull the fidelity note for the RBTs on that case. If the numbers are low, the program is being run wrong and the data look flat because the data are honest. Fix fidelity first. Most flat graphs unstick within four weeks of the staff actually running the plan the way it was written.

If you do nothing else, do this. Once a month, open the notes for your top three flat cases and your top three highest-staff-turnover programs. Scan the numbers. Pick one thing to change for the next month. One. Not ten. One.

That is the loop. Watch ten trials. Write the number. Once a month, scan the notes for trends. Pick one change. Repeat. It is not glamorous. It is the part of supervision that actually works.

Frequently asked questions#

Is procedural fidelity the same as treatment integrity in ABA supervision?

For day-to-day clinic use, yes. Some research papers split hairs. Procedural integrity usually means "did the steps happen in order." Treatment integrity sometimes adds "did the dose and the schedule also match the plan." In a clinic, you can use either word. Pick the one your team already uses and stick with it. The thing that matters is that you have a number, not which word you put on it.

How often should a BCBA collect supervision fidelity data per RBT?

A minimum of once a week per RBT you supervise, on at least one program per session. That is roughly two minutes of grading per visit. If you supervise five RBTs, that is ten minutes a week of measurement total. If the RBT is brand new or the case is not moving, bump it to every visit on the program that is stuck. The point is not volume. The point is a steady drip you can actually keep up with.

What is a passing fidelity score for an RBT running a behavior plan?

There is no BACB-set number, but the working rule most clinics use is 90% or higher is good, 80% to 89% is okay but coach it, and below 80% means the RBT needs hands-on BST on that program before the next session. The number matters less than the trend. An RBT going from 75% to 85% to 92% over three weeks is a win. An RBT holding at 95% but with one specific step always wrong is also a coaching moment. Read the pattern, not just the percent.

Watch the full talk#

If the 90/20 gap matches what you are seeing in your clinic, the full CEU walks through the rest of the Colin and colleagues 2024 findings, the BST training gap, and the supervision habits that actually move the number. It is free and counts for one supervision credit toward your BACB recertification.

Watch the full CEU on supervision fidelity

Turn this topic into a CEU