You are on page 1of 11

Study Guide Psychology of Learning (PSY211) Exam #2 Operant and Instrumental Conditioning: Reinforcement Key terms Operant (emitted)

behavior: any procedure in which a behavior becomes stronger or weaker depending on its consequences. Reflexive (elicited) behavior: stimulus evokes in innate, oftern reflexive, response trail and error/success. Trial and error/success: Thorndikes Puzzle Box/Skinners Operant Chamber/Tolmans Maze. Law of effect: every action has a consequence. This consequence causes behavior to change. Positive reinforcement: the presence of a stimulus increases the likelihood of the preceding response (e.g., food, money, praise, drugs, electrical stimulation of pleasure centers in the brain). Sometimes called reward. Adding something positive. Negative reinforcement: the removal of a stimulus increases the likelihood of the preceding response (e.g., remove hand from a warm stove, improve grades to lift restriction, work hard not to get fired). Taking away something negative. Primary reinforce: naturally or innately reinforcing stimuli (e.g., food, water, sex). Secondary (conditioned) reinforcer: reinforcers that are dependent on their association with other reinforcers (e.g., praise, recognition, money). Generalized reinforce: secondary reinforcers that have been paired with a wide variety of primary reinforcers (e.g., money, praise). Superstitious conditioning: learning about rewards or negative reinforcers from coincidental occurrences. Incorrect learning after accidental rewards. For example, walking under a ladder = bad luck vs. quicker clapping = closer to goal/prize. Successive approximation (shaping) Chaining: performing behaviors in a sequence (e.g., ordering take-out). Forward Chaining: Train first-to-last Backward Chaining: Train last-to-first Acquisition: the initial stage in classical conditioning, gradual increase in responding when reinforcing stimulus follows the behavior (e.g., toilet training, athletic skills, stupid pet tricks). Successive Approximation (Shaping) Extinction: removing the reinforcer in order to stop a behavior. For example, a time out is extinction. Spontaneous recovery: the reappearance, after a pause, of an extinguished conditioned response. Resurgence: reappearance of previously reinforced behavior; an animal goes through an entire life of previously learned stunts. Primary drives: innate drives, such as hunger, thirst, and sexual desire that arise from basic biological needs.

Secondary drives: acquired through learning; affiliation, social, achievement, aggression, power. For example, money, grades, friends, intimacy, etc. Escape conditioning: training an organism to remove or terminate an unpleasant stimulus. Their behavior causes an unpleasant event to stop and so they continue that behavior. They make the correct new response to stop delivery of the undesired stimulus. Avoidance conditioning: increase in behavior that allows one to avoid an aversive stimulus. Key issues/distinctions/questions What is the one key condition for effective reinforcement? Behavior must have a consequence. Identify the sequence of events that leads to reinforcement or punishment. Why is classical conditioning termed S-S and operant conditioning termed R-S? Classical conditioning is the pairing of two stimuli (light air putt, bell food). Operant conditioning is based upon reward/punishment responses to a stimuli (roll over doggie treat, work late extra pay, studying good grades). Generate examples of the following: Positive reinforcement: father gives candy to his daughter when she picks up her toys. If the frequency of picking up toys increases or stays the same, the candy is a positive reinforce. Negative reinforcement: turning off distracting music when trying to work. If the work increases when the music is turned off, turning off the music is the negative reinforce. Primary reinforcer: a stimulus that does not require pairing to function as a reinforcer and most likely has obtained this function through the evolution and its role in species survival. Examples include: sleep, food, air, water and sex. Secondary reinforcer: a stimulus or situation that has acquired its function as a reinforcer after pairing with a stimulus that functions as a reinforcer. For example, the sound from a clicker, as used in clicked training. The sounds of the clicker has been associated with praise or treats, and subsequently, the sound of the clicker may function as a reinforcer. As with primary reinforcers, an organism can experience satiation and deprivation with secondary reinforcers. Positive punishment: mother yells at a child when running into the street. If child stops running into the street the yelling is the punishment. Negative punishment: a teenager comes home an hour after curfew and the parents take away the teens cell phone for two days. If the frequency of coming home after curfew decreases, the removal of the phone is negative punishment. Shaping: you want a sea lion on a ball. First you reward the sea lion to go near the ball. Then reward the sea lion for touching the ball. Finally, you reward the sea lion to get on the ball. Chaining: train a rat to pull a string that releases a marble, have it pick up the marble, carry the marble to the tube, then have it drop it in the tube. Superstitious conditioning: the organism is rewarded (or removal of punishment) while performing a response, and even though the response and

reward aren't related, the subject associates the two together. Example: You hurt your thumb, and keep swearing until the pain goes away. The pain eventually goes away, and you assume it was because of your swearing, and consequently swear every time you're hurt to relieve pain. The swearing actually did absolutely nothing, but you 'superstitiously' associate the two. Extinction: Pavlov stopped giving food to the dogs when he rang the bell, so the dogs stopped salivating to the bell. Spontaneous recovery: after a long time of not salivating to the bell, one instance the dog starts salivating anyway. Escape conditioning Avoidance conditioning What effect do the following have on the acquisition of operant behaviors (i.e., on the speed of conditioning or the strength of the response)? Amount of reward: many small rewards better than a few large ones. Type of reward: chocolate better than raisins. Delay of reward: the longer you delay, the less effective it becomes; immediate is best. Conditioning somatic (voluntary) behavior versus autonomic (involuntary) behaviors. Somatic is easier to donation then autonomic. Deprivation level: learning is faster and stronger when learned is deprive of rewards. Competing rewards: conditioning is a slow and weak if other behaviors are also being rewarded (focus on one at a time). Awareness of reward/behavior contingency: not necessary for conditioning; leads to faster conditioning. What effect do the following have on the extinction of operant behaviors? Reinforcement variability: Stimulus variability Response variability How do the following theories of reinforcement differ? What are the basic problems with each theory? Drive reduction: behavior is driven by a desire to lessen drives resulting from needs that disrupt homeostasis reinforcers: primary (food, water, sex), secondary (success, popularity). o Drive: A motivational force. Tension from unfulfilled needs or desires Primary Drives (e.g. hunger, thirst) Secondary Drives (e.g., success, popularity) o Reinforcer: Any stimulus that reduces drive by fulfilling the needs and desires (e.g., food, water, money) o Difficulties with the theory: Some reinforcers do not reduce drives (electrical stimulation of the brain, copulation without ejaculation). Some motivations do not create states of tension that need to be reduced (exploratory behavior). 3

Relative value (Premack principle): o Reinforcers viewed as behaviors (e.g., food smell vs. chewing behavior) o Relative value: Some behaviors are more probable (more preferred) than others (e.g., partying vs. studying) o Premack Principle: High probability (preferred) behavior reinforces low probability (non-preferred) behavior o Problems with theory: How to explain strong secondary reinforcers (e.g., why is verbal praise such a powerful reward?) Sometimes low probability behavior reinforces high probability behavior if the less likely behavior has been prevented (e.g., deprivation of study time) Response deprivation (Timberlake and Allison): relative value of responses depends on relative deprivation. Behaviors that are not allowed to occur will reinforce other, less deprives, behaviors (e.g., prohibition in the 1920s made drinking booze a much stronger reward). What are the two processes in the two-process theory of avoidance? Why is there a problem with using these two processes to explain avoidance? Operant and Instrumental Conditioning: Punishment Key terms Positive punishment: the presence of a stimulus (usually aversive such as slap, scolding, or a dirty look) decreases (suppresses) the likelihood of a preceding response. Negative punishment: the removal of a stimulus (usually something pleasant such as TV privileges or a desirable object) decreases (suppresses) the likelihood of a preceding response. When the stimulus that is removed is a reinforcer, we call this extinction. Displaced aggression: people who are punished at work might sabotage the work. People who get punished at school might vandalize the school. Elicited aggression: you put two people in unsafe environment, people will get aggressive. Learned helplessness: the failure to escape an aversive following exposure to an inescapable aversive. Differential Reinforcement of Low Rate (DRL): a behavior is reinforced only if it occurs no more than a specific number of times in a given period. Differential Reinforcement of Zero Responding (DRO): reinforcement is contingent on the complete absence of a behavior for a period of time. Differential Reinforcement of Incompatible Behavior (DRI): a form of differential reinforcement in which a behavior that is incompatible with an unwanted behavior is systematically reinforced. Differential Reinforcement of Alternative Behavior (DRA): a form of differential reinforcement in which a behavior that is different from an undesired behavior is systematically reinforced. Key issues/distinctions/questions 4

What are the three necessary characteristics for punishment? Behavior has a consequence (e.g., crime leads to prison, cheating leads to dismissal) Behavior decreases in strength or frequency (e.g., crime declines, cheating stops) Reduction in behavior is a result of its consequences (e.g., criminals go straight because of prison, cheating stops because of dismissal) How are punishment and negative reinforcement different? Positive: The behavior (response) leads to the onset of some aversive event that suppresses future responses (e.g., shock, scolding, physical blows) Negative: The behavior (response) leads to the offset (removal) of some pleasant event that suppresses future responses (e.g., removal of attention, desired toy, previous rewards) How are negative punishment and extinction related? They are related in the sense that they both deal with the taking away of something to decrease behavior. Extinction to take away reinforce to decrease behavior, negative punishment it taking away of a appetite reinforce to decrease behavior. What effect do the following conditions have on punishment? R-S contingency: Dependency of punishing event on behavior (the response must lead directly to the punishing event). R-S delay: The longer the delay between response and punisher, the less effective the punishment (e.g., immediate reprimands are better than delayed reprimands). Intensity of punisher: Strong punishers work better than weak punishers. Progressive punishment: Punishment is less effective if weak punishers are followed by progressively stronger punishers. Behaviors that are both reinforced and punished o Behaviors that are both reinforced and punished become resistant to punishment (e.g., children who get attention [reinforced] by being punished for misbehaving become increasingly troublesome). o Punishment works best on behavior (e.g., criminal activities) when alternative behaviors (e.g., community service) are reinforced. o When the motivation to engage in a behavior is strong (because the reinforcement was strong) punishment is less effective. Presence of alternative behaviors Punishment works best on behavior when alternative behaviors are reinforced. Behaviors that are strongly reinforced When the motivation to engage in a behavior is strong (because the punishment was strong) punishment is less effective. 5

What are the three primary theories of punishment? Which theory is the most limited? Disruption Theory: Punishment suppresses responding because it leads to a disruption of ongoing activity (e.g., jumping, freezing). Can be dismissed rather easily. Two-Process Theory: Punishment involves both classical and operant conditioning. Similar to the two process theory of avoidance. Stimuli associated with the punisher (e.g., lever, cookie jar) become a CS for reactions to the punisher (e.g., the sight of the lever or the cookie jar is associated with fear). We avoid the CS (e.g., lever, cookie jar) and thus decrease responses to the stimulus (e.g., dont press lever, dont take cookies) One-Process Theory: Only operant conditioning is involved in punishment. Punishment suppresses behavior just as reinforcement strengthens behavior (e.g., high preference behavior reinforces low preference behavior; low preference behavior punished high preference behavior). How do the one-process and two-process theories of punishment differ? Only operant conditioning is involved in punishment. The two-process theory invovles both classical and operant conditioning. What are six major problems with using punishment for behavioral control? Temporary Effects: The effects are not long lasting. Escape and Avoidance: We try to escape from or avoid aversive stimuli (e.g., running away from home, lying to parents, escaping from prison). Aggression: Aversive stimuli lead to aggression. Displaced Aggression (e.g., sabotage, vandalism). Elicited Aggression. Apathy: Punishment suppresses other behaviors. Fixation: Punishment limits the range of behaviors. Animals only respond in safe ways, are unwilling to try new behaviors (e.g., learned helplessness). Progressive Punishment can go too far (e.g., spouse abuse). Imitation of the Punisher (e.g., successive generations of child abuse). What are five alternatives to aversive control? Prevention, Extinction, differential reinforcement of zero (DRO) responding, differential reinforcement of low rates (DRL) of behavior, and reinforce other behaviors.

Why has incarceration (imprisonment) been used as a form of punishment over the years? Has incarceration been successful? Operant and Instrumental Conditioning: Schedules Key terms Cumulative responses: overtime, you look at responses as they accumulate. If the sleep is steep then its a fast rate of learning. Response rate: Continuous reinforcement: a correct response is reinforced every time it occurs. Intermittent (partial) reinforcement: occurs for some responses but not all. Ratio schedule: reinforcement is based on the number of responses (the ratio of reinforced to non reinforced responses). Interval schedule: reinforcement is based on the time since the last reinforced response. Fixed ratio (FR): the number of reinforced responses is a fixed number. Variable ratio (VR): the number of reinforced responses varies. Post-reinforcement pause: the pauses that follow reinforcement. Run rate: the rate at which behavior occurs once it has resumes following reinforcements. Fixed interval (FI): the amount of time the animal must wait until the next response is reinforced is fixed. Variable interval (VI): the amount of times the animal must wait until the next response is reinforced is variable. Fixed time (FT): reinforcer is delivered after a period of time without regard to behavior; used to establish superstitious behavior. Variable time (VT): reinforcers delivered at irregular intervals, regardless of behavior; also may lead to superstitious behaviors. Fixed duration (FD): reinforce is delivered of a behavior occurs continuously over a period of time. Variable duration (VD): required period of performance varies around some average. Differential reinforcement of low rates (DRL): a behavior is reinforced only if it occurs no more than a specified number of times in a given period. DRL is used to encourage low rates of responding. It is like an interval schedule, except that premature responses reset the time required between behaviors. Differential reinforcement of high rates (DRH): a form of differential reinforcement in which a behavior that is different from an undesired behavior is systematically reinforced an alternative way of obtaining reinforces DRH reinforcement after a minimum number of tiems an action is performed in a given period, produces highest rate of behavior. Ratio stretch Ratio strain: disruption of the pattern of responding due to stretching the ratio of reinforcement too abruptly or too far. Partial reinforcement effect: the tendeny of a behavior to be more resistant to extinction when partially reinforced than when continuously reinforced.

Resistance to extinction: intermittent (partial) reinforcement schedules, compared to continuous reinforcement schedules, make animals reluctant to give up responding when the reinforcers stop. Key issues/distinctions/questions What behavioral pattern on a cumulative record do the following schedules produce? Give examples of each: Fixed ratio: if you got a rat in a operant chamber, the rat presses the bar three times. The third times it get reinforced. This produces a stair-case like figure. Variable ratio: a rat is an operant chamber, the number of reinforcement varies. This produces a high, steady rate. Fixed interval: the rat has to wait a certain amount of time and presses the bar, and then the act is reinforced. This produces a kind of scalloped function. Variable interval: the rat must wait, on average, 10 seconds after the last reinforced response before the next response is reinforced, but this time can vary. Produces a low, steady-rate function. Give examples of the following time-related schedules: Fixed time: FT-10 means the animal gets a reinforcer after 10 seconds no matter what it happens to be doing. Variable time: VT-10 means the reinforcer is delivered every 10 seconds, on average, sometimes more, sometimes less. Fixed duration: practice violin for 30 consecutive minutes to get an ice cream cone). Variable duration: if a kid is hyperactive, then you reinforce sitting quite for a while. Give examples of the following rate-related schedules: DRL: Reinforce animal for responding at a slow rate (e.g., press the bar every five seconds). Used to help people slow down (e.g., hyperactivity). DRH: reinforce animal for responding at a fast rate (e.g., press bar five times during every 10-second interval). Used to help people speed up (e.g., dawdlers). Ratio stretch: start the animal out on a low ratio schedule (e.g., FR-1) then gradually increase the ratio (FR-3, FR-5, FR-10). Stretching too fast or too far (e.g., FR-300) creates Ratio Strain (responding is disrupted). Describe the four theories of the partial reinforcement effect. How are they different? Discrimination Hypothesis: It is harder for the animal to discriminate between an intermittent schedule and extinction than between continuous reinforcement and extinction (i.e., the animal cant tell when partial reinforcement ends and extinction begins). Frustration Hypothesis: There is greater frustration for animals who switch from continuous reinforcement to extinction than for animals who switch from partial reinforcement to extinction. Frustrated animals stop responding sooner.

Sequential Hypothesis: The sequence of reinforced and non-reinforced responses becomes a cue for future responding. An animal performs longer in the absence of reinforcement following intermittent rewards because non-reinforced trials are cues to keep on responding. Response Unit Hypothesis: The response should not be defined as a single behavior (e.g., bar press or key peck). The response is whatever complex actions (units of behavior) lead to a reinforcement (e.g., the response unit for an FR-3 schedule is three bar presses). The response unit for extinction more closely resembles the response unit for partial reinforcement than for continuous reinforcement. During extinction, the animal may actually produces fewer response units after partial reinforcement than after continuous reinforcement. Operant and Instrumental Conditioning: Generalization, Discrimination & Transfer Key terms Stimulus generalization: the tendency for a response learned to one specific stimulus (e.g., flirt with people with red hair) to also occur for other, similar, stimuli (e.g., flirt with people with auburn hair). Response generalization: if a response of one type (e.g., punch a classmate, typing on a keyboard) is blocked, then there is a tendency to make a similar response to the same stimulus (e.g., kick the classmate, bang on the keyboard). Stimulus discrimination (Stimulus Control): when a response learned to one specific stimulus does not occur to other stimuli (e.g., go at a green light, stop at a red light). The opposite of stimulus generalization. Response discrimination: learning not to make similar responses to the same stimuli (e.g., shifting gears, discriminating between a bad golf swing and a good one). The opposite of response generalization. Stimulus control Discriminative stimuli (S+): any stimulus that signals either that a behavior be reinforced (an S+ or Sd) or will not be enforced (an S- or S[triangle]). Successive discrimination: subject can identify the difference between different stimuli successfully. Simultaneous discrimination: different stimuli are presented at the same time and a subject chooses which one to pay attention to. Matching to sample: a discrimination procedure in which the task is to select from two or more comparison stimuli the one that matches a sample. Errorless discrimination: present the S+ strongly and weak form of the S-. 9

Excitatory gradient: the gradient in which a new stimulus related to a previous excitatory stimulus. Inhibitory gradient: the gradient in which a new stimulus is related to a prior inhibitory stimulus. Peak shift: subject is more likely to respond to S new than S+ because excitation is greater than inhibition. Basic transfer design: what we learn in one situation carrying over into another situation. In order to study transfer, you have to have at least two groups, an experimental and control group. Experimental group learns task 1 and then sometime later learns task 2. You want to know if task 1 helped or hurt task 2. You dont know the answer to that until you compare it to the control group. The control rests during the time that the experimental groups learns task 1. Then at a later time, the same time that the experimental group is learning task 2, the control group learns task 2. If the experiment does better in task 2, then task 1 helped them. Positive transfer: experimental groups perform better on Task 2 than control group. Negative transfer: experimental group performs worse on task 2 then control group. Warm-up effects: you start off studying and its a struggle but as you progress it gets easier. Learning to learn: you start learning something and its difficult. Then, you learn new strategies and get better at it. Key issues/distinctions/questions Provide examples of the following: Stimulus generalization: you have the tendency to flirt with red heads, you begin the tendency to respond to people with similar hair color (I.E., strawberry blond or auburn). Response generalization: a classmate punches you, and then you might kick the classmate. You are being frustrated on typing on a keyboard; you then pound on the keyboard as if it will help. Stimulus discrimination: You go at a green light, stop at a red light. We do one thing when the light if green and we do not do that one thing when the light is red. Response discrimination: shifting gears in a car. Discriminating between a bad gold swing and a good one. What are the essential elements of Pavlovs Physiological Theory of discrimination? The reinforce stimulus (S+) creates an area of excitation in the brain that produces a response(R). The non-reinforced stimulus (S-), creates an area of inhibition in the brain that inhibits responding and produces non responding (NR). What are the essential elements of Spences Gradient Theory of discrimination? How does it differ from the Lashley-Wade theory? S+ creates a gradient of excitation (green), S- creates a gradient of inhibition (red). The tendency to respond to a new stimulus reflects the net difference between excitation and inhibition. Using the basic transfer design, how does a researcher know when negative transfer has occurred? Why?

10

When the experimental group does worse on task 2 than the control group. For example, an experimental group learns how to play tennis. The control group rests. Then, both of these groups learn how to play racquet ball. Usually, the control group will play better because the experimental group has tennis in mind. In a transfer experiment with SA-RB in Task1 and SC-RD in Task 2, what do the A, B, C, D subscripts refer to? Which of the following conditions usually lead to positive transfer and which usually lead to negative transfer? Response generalization - Positive Stimulus generalization - Positive Response facilitation/mediation - Positive Response interference Negative Supply some examples of the following transfer situations. Which produce positive transfer and which produce negative transfer?

11

You might also like