In detail

Learning theories: reinforcement, learned helplessness and punishment

Learning theories: reinforcement, learned helplessness and punishment

Learning theories


  • 1 Classic Conditioning
  • 2 Motivate to continue: reinforcement
  • 3 Helplessness learned
  • 4 The punishment
  • 5 Learning by imitation
  • 6 Superstition

Classic Conditioning

In the first part of this article we reviewed the classical conditioning, to our gift of adaptability and we introduced you to operant conditioning (we repeat more the awarded behaviors and less the punished ones).

Mode of action

The process of instrumental or operant conditioning Requires the following sequence:

  1. Stimulus, which we will call from now on Discriminative Stimulus (ED), because it will discriminate with others the answer.
  2. Reply, which we will call Operant Response (RO).
  3. Reinforcement, which is a Reinforcement Stimulus (ER), which implies an association with the operant response to enhance it.

We repeat what we like

The law of effect discovered by Thorndike, affirms a very simple principle that consists of two parts. The first states that the responses that produce satisfactory consequences are consolidated, and therefore, are issued with increasing frequency. The second presupposes that organisms learn responses that prevent or avoid unpleasant stimuli.

Parents use rewards to mold good table manners praise behavior that resembles that of adults, right? This is a clear example of operant conditioning.

Motivate to continue: reinforcement

What is the enhancer? It is all that increases the frequency of response. There are two types of reinforcers: the positive and the negative. A reinforcement is positive when consolidating a response to be presented after it and to be considered by the subject as a prize (food, approval, money, expressions of affection ...). It is negative when it tends to be eliminated after the response, which can be consolidated.

Food or pain suppression are primary boosters, understood this way because they are innate. On the other hand, money, success, compliments, qualifications, pleasant tone of voice, etc., are secondary Y they learn in general related to the primary.

The immediacy A reinforcer influences many behaviors. Many smokers or drug users know about this. This is based on the use of pain relievers. But, in general, we learn to respond to more reinforcers delayed: the salary at the end of the month, the triumph at the end of a fight, the quarterly qualifications ... and although the instant reinforcer is usually very effective, to function solidly we must learn to postpone the immediate rewards in favor of the long-term ones, which They are usually more blunt. It has been proven that children who are already learning as children to postpone important prizes, against immediate rewards, become more competent adolescents.

To program is to progress

In everyday life there are well-defined rules that govern the existence of each of us, which program our awards or punishments. Thus, our parents used to tell us that they would give us financial support or some desired privilege (for example, going out at night) on condition that we fulfill our duties. We are subject to rules that we can call partial or intermittent reinforcement programs and that ensure the persistence of a behavior, much more than constant reinforcements, which end up extinguishing the response due to fatigue or routine. In the long run, partial reinforcement determines a greater resistance to extinction.

  1. The fixed interval programs They are regulated for a certain time beforehand, with an equal pause after each reinforcement. During this period no reinforcers are available, the following response being reinforced at the end of the fixed interval. The clearest examples are our salary or the gifts offered at a specific time by tradition, such as Christmas.
  1. In the variable interval programs, the lapse is marked by an average value partly submitted at random. The first response is reinforced after variable intervals. For example when we give to the tantrums of children in pursuit of tranquility that in the long run, it can turn against us. These programs tend to produce a slow but regular response.
  1. The fixed ratio programs They are based on a reinforcement that appears after a certain number and preset number of responses. For example, piecework: for every fifty units of production a premium is received. It can be considered an effective system because it usually produces high response rates, with brief and momentary pauses after each reinforcement.
  1. The variable rate programs They provide a reward after a certain but predictable number of responses. They are the most used in the game and responsible for a good number of compulsive addictions, since they determine high response rates, because the enhancers increase as the response increases.

The helplessness learned

Organisms subject to processes that they consider uncontrollable would develop learnings of non-control, of events considered independent of their own will. People can resign themselves to environmental conditions that indicate the inability to control the results of their own actions. We go into expectation of helplessness, since we receive punishment (often self-inflicted by our own negative thoughts), whatever our behavior. We no longer escape even if the situation allows it; We even underestimate the possibilities of possible reward in case of adopting an appropriate strategy, so that we fall into a situation of paralysis that in turn causes emotional disturbances of hopelessness.

The punishment

The punishment tells us what we should not do; the reinforcement what we should do. The most effective is to combine the punishment with the positive reinforcement, which will increase the effectiveness of the method. If the punishment is applied, it must meet certain conditions to achieve a certain effectiveness: 

  • Must be subordinate to specific behavior.
  • Never alternate reward and punishment for the same behavior (avoid distortions between father and mother or between parents and teachers).
  • Provide subjects with alternative means to get the reward.
  • It should not be generalized to personal traits ("you are stupid").
  • We have to avoid prolonged punishment.

Certain authors further argue that prolonged and excessive punishment can trigger aggressiveness in the subject who suffers from it, and give as an example the correlation between aggressive offenders with children who had in homes that were too aversive and punishing.

Imitation Learning

Since we are babies, we observe and imitate the forms of behavior of others, which is called modeling or vicarious learning. This process is incredibly effective, because it avoids the tedious procedures of scoring and trial-error that accompany instrumental conditioning. Thanks to this we also learn without previous attempts and it facilitates the wide repertoire of our social behavior.

The modeling allows to explain the psychosocial behaviors that cause positive and solidarity effects, marked by our parents, social environment or by the media.


Good part of our superstitions they learn by taking a positive or negative reinforcement casually. In the famous experiment of Skinner with pigeons, which were reinforced with food by pecking on a disk, he observed that if a pigeon turned casually before pecking, he believed that the turn served to obtain food and, despite the fact that the food was given for pecking, regardless of the turn, the pigeon learned superstitious behavior and continued to make the turn even if it was useless. This is how we learn to attribute an incorrect cause to an effect, because our mind is like that, it needs to have everything in order.

Study techniques: The power of images.