Trials to Criterion for Toddlers: Why 2-of-2 Beats 8-of-10 in Early ABA

Stop burning out 2-5 year olds with 8-of-10 mastery criteria. Here is how to set realistic toddler ABA goals that build skills without hoop-jumping, from a BCBA-led CEU.

Key takeaway

For a 2 to 5 year old, a 2-of-2 mastery target almost always beats an 8-of-10 one, because hitting 8-of-10 across five sessions means asking that small kid to run the same target 50 times.

Watch the full CEU recording

Child Development Deep Dive: Early Childhood (2-5 year olds)

Kelly Brzak · 1 CEU · 59 min
Watch on openceu.com →

Trials to Criterion for Toddlers: Why 2-of-2 Beats 8-of-10 in Early ABA

For a 2 to 5 year old, a 2-of-2 mastery target almost always beats an 8-of-10 one, because hitting 8-of-10 across five sessions means asking that small kid to run the same target 50 times. That is the framing Kelly Brzak, BCBA, builds on in her CEU on early childhood programming, and it is the framing Dr. Patrick McGreevy uses in the Essential for Living (EFL) curriculum: write the criterion the child can win against, not the criterion that looks tidy on a graph.

If you write goals for toddlers and preschoolers in ABA, this guide is the playbook for switching from inherited 80% mastery rules to criteria that match how 2 to 5 year olds actually learn.

Why 8-of-10 mastery criteria break down with 2-5 year olds#

Most of us did not pick 8-of-10 on purpose. We inherited it. It looked clean. It read like 80%, which sounds like real mastery. So we put it on color tacts, receptive ID, imitation, requesting, almost every acquisition program for early learners.

The problem is that 8-of-10 is a criterion built for older, stronger learners with longer attention spans and more reinforcer durability. A two year old does not have that. A four year old who has already done a full preschool day does not have that. When the criterion is too high, the kid does not fail. They quit. Then we chart the quitting as "non-compliance."

Kelly is direct about what this trade-off looks like in the room.

"Anytime that I've been with little ones and I've tried to push past the two or three, that's when I start getting into task refusal and elopement and kids wanting me to leave." From the talk — Kelly Brzak

That is the real cost of a clean-looking goal sheet. You get less skill, not more.

The math problem: 8-of-10 across 5 sessions means 50 trials of the same target#

Here is the math nobody runs before they write the goal. If Susan needs 8-of-10 correct tacts across 5 consecutive sessions, she has to tact the same target 50 times to call it mastered. Per target. With nothing but a small bin of markers or color cards in front of her.

Kelly puts a name to it:

"If I'm expecting Susan to tack eight out of ten colors across five consecutive sessions, I'm asking Susan to tack 50 colors." From the talk — Kelly Brzak

For a three year old that is hoop-jumping. It is not learning a color. It is learning the routine of being asked to label a color in a chair. That is also why color acquisition can feel like it stalls. It is not the skill. It is the criterion.

What hoop-jumping looks like in toddler sessions#

You can spot hoop-jumping before you ever pull the data sheet. Watch for:

  • The kid stops orienting after trial three or four.
  • They start handing you the materials back, "all done," walking off, climbing under the table.
  • The technician starts physically holding the array steady or restating the SD louder.
  • The graph shows perfect early trials and a drop-off in trials 7 through 10, every session.
  • Tantrum and elopement data climbs while acquisition data flatlines.

This is the system telling you the criterion is the problem. Most of us were taught to respond by adding reinforcement or shortening the inter-trial interval. The cleaner fix is to lower the criterion to something the kid can win against, then build variability with a running record.

How to write 2-of-2 and 3-of-3 goals that still satisfy insurance#

The instinct is, "insurance will not accept 2-of-2." In Kelly's experience, and in the EFL framework Dr. Patrick McGreevy teaches, you can write low-trial criteria across many targets and still pass funder review. The trick is in how you stack the criterion.

Here is the side-by-side. The old goal:

Susan will tact 8 of 10 colors across 5 consecutive sessions.

Here is the rewrite:

Susan will tact 2 of 2 colors per session, across 8 total colors, across 5 consecutive sessions.

Same skill. Same end state, knowing 8 colors. Way less burnout. The kid is asked to tact two colors at a time, not eight, and the array rotates so we still see range and durability across the full set.

You can do the same thing across receptive ID, imitation, requesting, and turn-taking. Kelly's framing in the CEU:

"If I set a minimum of three out of three items as a goal, as opposed to five out of five or four out of five or eight out of 10, I'm going to let my little people be more successful." From the talk — Kelly Brzak

The lift here is not lowering the standard. It is matching the trial structure to the developmental window.

Pairing trials-to-criterion with a running record for variability#

The pushback you will get from clinical reviewers, even your own ones, is: "If the criterion is only 2-of-2 or 3-of-3, how do you know it generalized?" Fair question. The answer is the running record.

A running record is a low-effort log that sits next to your program book. The tech captures, in plain notes:

  • Which exemplars came up that session.
  • Which the kid hit independently.
  • Which needed a prompt and what level.
  • Any natural-environment moments where the target showed up outside of programmed trials.

So Susan tacts "red" and "blue" in session, but she also points at her brother's red truck and says "red" on the rug. That goes in the running record. Now you have a 2-of-2 trial criterion plus a paper trail showing the target is generalizing across people, settings, and arrays. That is what funders actually care about, and it is what protects you from "yes she hits 8-of-10 in the chair but never uses it in real life."

The EFL workbook formalizes this. If you have not opened it, the Essential 8 booklet is around $13 and gives you a running-record template you can adapt to almost any early-learner caseload.

When to scale up: signs the child is ready for higher criteria#

You do not have to stay at 2-of-2 forever. Watch for these signals before scaling the criterion:

  • The kid is requesting more trials, not avoiding them.
  • They hit 2-of-2 across 8+ exemplars on the running record without prompts.
  • Generalization shows up across at least two caregivers and one non-therapy setting.
  • Pace is faster than the SD delivery, meaning they are anticipating and engaging.
  • They tolerate variation in materials, voices, and locations.

When most of those are green, step to 3-of-3 across a wider exemplar pool. From there to small percentage-based criteria. By the time the child is around 4 and a half to 5, with strong instructional control and reinforcer durability, 80% mastery rules start to fit. The mistake is starting there.

Worked example: rewriting a color tact goal for a 3 year old#

Take a real goal off a real BIP and walk it through.

Old goal: "Jordan will tact 8 of 10 primary and secondary colors across 5 consecutive sessions with two technicians."

Rewrite, EFL-flavored:

Jordan will tact 2 of 2 colors per session, across a rotating array of 8 target colors (red, blue, yellow, green, orange, purple, black, white), across 5 consecutive sessions, with at least two technicians and one caregiver. Running record will document independent tact responses in the natural environment across the same exemplar pool.

What changed:

  • Trial count per session dropped from 10 to 2 per target.
  • Total exposure across the 8-color set stays high because the array rotates.
  • Generalization is structurally baked in through technician and caregiver variety.
  • The running record carries the natural-environment evidence funders want.

That goal will pass clinical review, it will pass funder review at the agencies Kelly works with, and it gives Jordan a fighting chance to actually own the skill instead of memorizing the chair routine.

FAQ#

Will insurance accept 2-of-2 or 3-of-3 mastery criteria? Most funders accept low-trial criteria when the goal language stacks them across multiple targets and sessions, and when a running record documents generalization. The rejection usually comes from a single isolated "2 of 2" line with no exemplar pool. Write the stack.

Does a lower criterion mean weaker skill acquisition? No. It means trial count is matched to the developmental window. The kid still sees the target many times across the program, just not 50 times in a row on one target. Skill durability shows up in the running record.

How do I prove mastery if the criterion is only 3 trials? Pair the criterion with a rotating exemplar array and a running record. You prove mastery across exemplars, settings, and people, not by repeating the same target 50 times.

What is a running record and how does it pair with low-trial criteria? A running record is a short, ongoing log of exemplars, prompt levels, and natural-environment use. It pairs with low-trial criteria by carrying the generalization evidence the criterion line does not.

At what age can I start using 80% mastery criteria? There is no fixed age, but most kids in the EFL framework are not ready for 80% mastery rules until around 4.5 to 5, with strong instructional control and reinforcer durability. Before that, low-trial criteria plus a running record are the cleaner choice.

Where to go next#

If you write goals for early learners, this CEU is the one to watch end-to-end. Kelly walks through the full goal-writing logic, the EFL connection, caregiver pairing, and the assessment tools she rotates through.

The shift from 8-of-10 to 2-of-2 is small on paper and huge in the room. It is also the easiest single change you can make this week that will lower task refusal, raise session quality, and give your early learners a chance to actually win.