One of the most common questions and discussion subjects on our e-mail in the past 60 days or so has been the problem
of ratios - schedules of reinforcement, variable ratios (VR), variable schedules of reinforcement (VSR) - vs. continuous reinforcement
(CRF). This subject has also been a recurring problem, going back as long as we have been receiving and sending e-mail, and
farther back than that - even back to the "good old" days when we were training animals and educating our own trainers at
Animal Behavior Enterprises. So let's see what we can make clear first, what we are all talking about, and second, what is
our own (the Baileys') philosophy, practices, and advice about schedules of reinforcement. By the way, it seems to me that
most of the correspondents are using VSR and VR in an identical manner, both meaning VARIABLE RATIO. This little article is
a distillation of some recent e-mail discussions.
The correspondence we are referring to has to do with SCHEDULES OF
REINFORCEMENT. Simply put, a schedule of reinforcement is ANY plan or system for presenting a reinforcer for a given response,
according to ANY time interval, such as reinforcing a response every two minutes (creating an INTERVAL schedule), or ANY position
of a response in a series - reinforcing every second response, that is, a "two-fer," expressed as FR 2:1 (FIXED RATIO of two
responses for each reinforcer); every third response, a "three-fer," or a FR 3:1 ratio, every tenth (FR 10:1), hundredth (FR
100:1), and so on, response. If you VARY the interval then you have created a VARIABLE RATIO, the most commonly used in training
almost any behavior, and one of the most useful.
The simplest schedule, and one that trainers should all begin with
in training ANY response, is a ratio of reinforcing EVERY desired response. This is a ratio of 1 response for 1 reinforcer,
1:1, or CONTINUOUS REINFORCEMENT, abbreviated CRF (to prevent confusion with CR, which is an abbreviation for CONDITIONED
RESPONSE, or CONDITIONED REFLEX).
Any schedule other than CRF calls for what we call DIFFERENTIAL REINFORCEMENT, reinforcing
some responses and not others. DIFFERENTIAL REINFORCEMENT also is used in forming DISCRIMINATIONS, as in scent
discriminations, where the trainer reinforces the response to the scented article and not to the others; it is also a part
of SHAPING, where a trainer reinforces the responses that meet his or her criteria - that is, the response is straight enough,
fast enough, properly executed in every way - and extinguishes the responses that do not meet the criteria.
Third,
there are other schedules, those that specifically involve time. They are used less frequently, but useful in their
place. One is what we call a FIXED or VARIABLE DURATION schedule, where the trainer asks for a response to hold or continue
for a certain period of time - for example, asking a dog to hold a point, or a prone position ("stay") for 30 seconds.
There are also FIXED OR VARIABLE INTERVAL schedules. We will not say much about time schedules. They can be tough to implement.
The introduction of time as a variable can give the animal an opportunity to do things OTHER
than what you want the dog
to do, yet still respond according to specifications in time. Suppose you have asked the dog, on a FI 5min. schedule,
to jump up to a spot on the wall every 5 minutes. This FI 5min. schedule means that you reinforce the first correct response,
after the interval is up. Now, just think of all of the mayhem the dog can create in the five minutes! After the five minutes
is up, the dog must still jump up correctly to get its reinforcement, but it might have made many other responses, all of
which will gain SOME strength from the last reinforcement for the jump. Well, enough of interval schedules. They have little
place in most training programs.
The list of schedules can go on and on. For instance, you can also use a schedule
of DIFFERENTIAL REINFORCEMENT OF FAST behavior, abbreviated DRF, where the trainer reinforces only the responses that are
rapidly executed. You would be right here in thinking these last schedules we described are the same as SHAPING SCHEDULES.
Scientists have invested entire careers playing with schedules, and their effects on learning and behavior.
Let us
begin by clearly stating our own philosophy about continuous vs. ratio schedules: This philosophy can be quite simply put:
IF YOU DO NOT NEED A RATIO, DO NOT USE A RATIO. Or, in other words, stick to continuous reinforcement unless there is a good
reason to go to a ratio. We think most of you will accept that we have been involved in shaping a LOT of behavior. Much of
that behavior, we got, and used, without resorting to ratios. Many animals we used for a decade or more ON CONTINUOUS REINFORCEMENT.
We benefited from the time not lost establishing a ratio when it was not necessary. Thus we recommend that you consider giving
it a try.
Well, what are the relative advantages of continuous reinforcement (CRF), ratio (FR or VR), or interval (FI
or VI) schedules? Why and when would we use them?
What are the advantages of CRF? When and why should we reinforce
every response of a certain type, say, a proper SIT? First of all, the only way you can be sure that each response will be
"proper," that is, that it will meet your criteria, is to reinforce EACH AND EVERY RESPONSE that is proper, correct according
to your own criteria. If each correct response is NOT reinforced, and you start with a ratio, even a "two-fer," you are apt
to allow less than perfect responses to acquire strength from that final reinforcer after the second response. Let's say you
decide to try for two-fers. You tell the dog SIT - the first response is a bit sloppy, the second one is OK. You click and
treat. What have you reinforced? -- a sloppy response, chained to a good response. The sloppy one automatically acquires some
strength from the final reinforcer.
Hence, our rule No. 1 is IF YOU DON'T NEED A RATIO, DON'T USE A RATIO. If you decide
you NEED a ratio, then our rule No. 2 is keep a response on CRF until it is just what you want, on cue, with good fast reaction
(low latency), and you have given ANY (not just five or ten) reinforcers for the perfect sit - dozens, dozens, of times. DO
NOT BE IN A HURRY TO GO TO A RATIO. You should also "proof" the behavior in many different circumstances, different locations,
different audiences, under many distracting conditions, ALL on CRF. Then you can say your SIT is as good as you want it to
be, the dog knows when to do it and how fast to do it. The behavior is now strong and reliable.
In what
follows about "two-fers" and other ratios, we do NOT wish to appear that we are down-grading the advice of experienced trainers,
and we certainly do not want our comments to be taken personally. There are, as we will note below, reasons that this practice
has come so easily into the advice and the handbooks for training. We simply wish to give the readers here the benefit of
our own experience, which runs as follows: In most situations where dogs are being trained as pets, almost never would there
be a strong need for ratios. However, as far as we can tell from advice given to newbies, ratios have become de rigeur in
training. As nearly as we can tell, the "ritual" of the two-fer is widespread. It has gone inevitably into the practices and
the literature of many good trainers, because it was believed to be a necessary step for building up any resistance to extinction
and rapid performance. For example, in a recent Clicker Journal, a very respected trainer recommends getting the behavior,
and then, before moving to new locations and other fluency building exercises - starting with TWO-FERS! In videotapes, in
recent manuals, almost everywhere, TWO-FERS! The early use of ratios verges on dogma. There may be an occasional need
to give such unquestioned advice to clicker NEOPHYTES who might be prone not to reinforce behavior enough times to get it
strong enough. However, as experienced trainers, LOOK CAREFULLY AT WHAT YOU ARE DOING, and weigh the disadvantages of losing
precision, and the loss of time. As always, the choice is yours. Just be sure that you know you have a choice.
Now,
when SHOULD we use a ratio schedule? Remember our RULE NUMBER ONE - RATIOS ONLY WHEN NECESSARY. Once you have decided that
you need a ratio, then the answer is a) "After the behavior is as PERFECT as you want it to be, or as perfect as you are able
to get it, within reason." And b) "if you want to establish with that behavior a high resistance to extinction" - for example,
if you expect to be using it in some context where you cannot reinforce, c) "if you want the behavior to occur at a rapid
rate (responses per time interval) without reinforcement," or possibly, d) if you are working the dog with food reinforcers
and you do not want the dog to fill up too rapidly.
There is no question that a variable ratio is the best one to
use if you need, or want, a VERY persistent behavior without reinforcement. Just look at the number of times a really "hooked"
fisherman will cast out a bait without being reinforced. And, as one of our e-mail correspondents has noted, "according to
Skinner, compulsive gambling occurs partly because people become hooked by the variable ratio. The very next response may
pay off regardless of how long it has been since the last response paid off, so the gambler keeps responding." Quite true.
But how many times in your life with your dog do you run into the conditions of a), b), c) or d)? Certainly you probably
want a reasonable resistance to extinction, and certainly a reasonable rate of response. And, indeed, "this schedule (a variable
ratio) provides greater incentive to resume responding right after receiving a reinforcer than does the fixed-ratio schedule."
Probably
one of the best examples of when to use a VSR is the case of Ham, the chimpanzee astronaut trained by Joe Brady's group for
NASA. Ham was sent into space in the early 1960s, before the Mercury astronauts. Ham was taught to make discriminations and
complex responses to certain stimuli, such as flashing lights and special sounds. There was concern that the food dispensing
equipment might not work too well in weightless space. For that reason, and for other good reasons, it was decided to build
up Ham's responses such that he could work the entire mission ON EXTINCTION. Ham's responses were built up to THOUSANDS of
responses per reinforcement. One time Ham might be reinforced after a hundred responses, the next time it might be a thousand.
Now that, my friends, is a RATIO! If you are preparing to blast your doggie into space, and you want to make sure that it
keeps on working, VSR is definitely the way to go.
We used a VDS (Variable Duration Schedule) with our automated dancing
chicken unit. When a person dropped a quarter (a nickel in the early '50's) in a coin box, a door opened and released the
chicken into the performing area. The chicken walked over to a simulated juke box, pulled a loop, which started music playing,
and the chicken stepped onto a platform. In the center of the platform was a photocell. When the chicken broke the light
stream hitting the photocell, that started a timing mechanism (we used a dipper circuit that charged a capacitor, for the
electronically literate). Now, because the chicken was what it was, the chicken had to do something other than stand still,
so most chickens scratched, which looked like a dance. While it scratched about, it moved into and out of the light beam in
a rather unpredictable fashion. This varied the amount of time before the equipment said "enough" and fired the electric feeder.
In addition, just in case, we also placed a device in the circuitry (a variable tap on the capacitor for you electronic types)
that more or less randomly changed the criteria for firing the feeder. So, we had two methods of determining the VDS, one
method depended on the behavior of the chicken and one was independent of the chicken. The up-shot of this system was a chicken
that danced from 8 to 22 seconds. .As you can see, when we say VARIABLE, we mean just that.
Our piano playing duck
(and the variant, the PICKIN PEKIN guitar playing duck) were based on VSR. As the duck played the keys up and down, there
were microswitches being triggered by "hot" keys. In the old days we used stepping switches and later we used solid state
decade counters to keep track of how many keys had been struck. By various means we then more or less randomly selected a
number of keys that must be struck to fire the feeder (usually used a ring counter, or a variant thereof). The duck ended
up striking somewhere between 13 and 25 "hot" keys. What the patron heard was TWINKLE, TWINKLE, LITTLE STAR, because we also
programmed the output into a recognizable tune. Some people actually thought the duck was playing a tune. No wonder they can
sell so much ocean beach front property in Arizona.
How important are schedules of reinforcement. Most of the time,
especially in pet training, not terribly. That does not mean that they are insignificant to animal training. We had a coin
operated unit (probably our most famous) called BIRD BRAIN. BIRD BRAIN played tic tac toe. The person had the opportunity
to test his or her skill against the chicken (the chicken does get a little help). When we designed the control circuitry
for BIRD BRAIN, we allowed for reinforcement at the end of the game, meaning that the chicken would usually play three, four
or five times before the feeder fired. We knew that there would ordinarily never be a chance for the chicken's first move
to be reinforced. We also knew from experience that a certain percentage of the birds (we guessed about 25 percent or one
out of four birds) would have problems starting the game because THE FIRST PECK, OR MOVE, NEVER PAID OFF. In anticipation
of this problem, we incorporated what we called a FEED FIRST CYCLE switch that reinforced the birds after the first, or starting
peck. Well, we were almost right: it was one out of three birds or 33 percent. Those afflicted birds would simply pace back
and forth in front of the cage, approach the switch panel and lights, and then back away and pace some more. It might repeat
this behavior several times before finally giving the proper response. By turning on the FEED FIRST CYCLE switch, that delay
behavior (delaying reinforcement, of course) would suddenly disappear after a few pecks had been reinforced at the beginning
of the performance. Sounds strange, doesn't it. When Skinner played our little game the first time (at a scientific
conference in the late 70's) he was intrigued by the game, and very much impressed that the technology had come so far that
we could PREDICT from the reinforcement schedule how certain birds would respond. We told him it was because we had to make
a buck at it that we knew it so well. He enjoyed the joke, but he understood that it was only partially in jest.
I
have not discussed any of our free environment stuff: seagulls, dolphins, dogs, cats, ravens, vultures, etc. Most of that
work combines desensitization (the really tough part) with some rather exotic VDS and VSR schedules. Some of the seagulls
and dolphins were on excursions lasting hours. That meant that some trips might last for only a half hour and others might
go on for much longer. Some of the dolphin excursions lasted the entire day, meaning only ONE trial. As shown in PATIENT LIKE
THE CHIPMUNKS, the animals did things once they got to the target area, but, in my opinion, the getting there was always the
hard part. The animals rather quickly mastered most of the terminal maneuvers, even the tough ones. By the way, as difficult
as the terminal behaviors were, they were almost always on a CRF schedule continuous reinforcement, even though that reinforcement
might be many minutes or even hours away. I hope I have made our position on schedules of reinforcement clear. We use the
simplest schedule that works.
There are those that say a CRF schedule cannot yield ANY strength. Well, our answer
would be the WE found it good enough for some excellent behavior over the years. Besides, In our contacts with both experienced
and neophyte dog trainers, we found most were in such a hurry that they seldom used enough reinforcements on a CRF schedule
to both sharpen and then strengthen behavior. Some say they NEVER reinforce the same behavior more than a few times, and that's
it. - a direct quote from a forum post - "I have never asked for a behavior with no changes 20 times in a row, is there a
point to doing that?" (Others in this journal have talked of fluency, so we won't go into that here.)
Then, with a
partially trained behavior, they go to a ratio of whatever (and, for the sake of this discussion, it is irrelevant if it is
VR, FR, or ?). There is usually mentioned something about boredom or the dog quits doing the behavior or ? First, in our collective
experience (and this is essentially 100 years) neither of us have experienced in our training programs a bored dog, dolphin,
gull, raven, elephant, aardvark, pangolin, lion, bear, squid, fish, or ???????? Next, we have had dogs performing the exact
same identical behavior over 800 times in one day, and repeated that for more than a week. We did similar tests with dozens
of other kinds of animals. NONE OF THESE ANIMALS WERE ON A RATIO! Well, they were on a continuous ratio, if you want to split
hairs. We did NOT find the behavior in these animals to be frail. The behaviors did not evaporate when the animal was asked
to perform several trials with no reinforcement. Were the behaviors as persistent as they would have been under a VR schedule
training program? No, of course not. But, if the animal would have performed the behavior very well for 10 times without reinforcement,
would that not be sufficient for most tasks? How often do you need an animal to perform a behavior 100 or 1,000 times without
reinforcement of any kind, food, social contact, or the opportunity to perform another behavior, or whatever?
How did
the myth of frail CRF behavior find its way into the fabric of dog training? There are many possibilities. Perhaps in the
last 10 to 15 years, prominent teachers of clicker training found that many trainers were working with such weak behavior
that it fell apart when there was the least amount of stress, or if trainers failed to maintain some reinforcement.
The teachers may have, quite logically, solved that problem by concentrating on STRENGTH OF BEHAVIOR EARLY. They accepted
some diminution in the power to shape that is a consequence of training on a ratio. WE ARE NOT QUARRELING WITH SUCH A COMPROMISE.
The teachers deserve the credit for introducing the man/woman on the street to the technology. We are just pointing out that
many dog trainers are accepting on blind faith that TWO-FERS ARE THE AUTOMATIC AND ONLY WAY TO GO. THIS IS NOT TRUE. We do
not want to have this myth woven into the fabric of most dog trainers' understanding of operant technology.
Look at
it this way, perhaps, life is complicated enough without our making it more complicated. CRF is simpler than VSR. CRF
works. We like simple.
Bob & Marian Bailey
Hot Springs, Arkansas