Saturday, December 8, 2018

The Prisoner’s Dilemma, Reputation, and Tit For Tat

Lions hunt in packs. Pilot fish eat parasites off sharks without fear of those nasty teeth. Countries mostly cooperate with each other and don't constantly threaten total nuclear war. Why is there so much cooperation rather than unbridled selfishness?

A simple game can help us out here. Let’s play against each other. Here are the rules:
  1. Each of us selects one color - red or blue - and writes it down.
  2. Our scores will be calculated based on the combination of choices:
    • I select blue, you select blue: 3 points each
    • I select blue, you select red: 0 points or me, 5 points for you
    • I select red, you select red: 1 point each
    • I select red, you select blue: 5 points for me, 3 points for you
  3. Whichever one of us has the most points wins.
What’s your strategy? Let’s evaluate the options. If I select blue, there are two things that can happen. You select red and I lose 0 – 5. Not good for me. If you select blue, we tie 3 – 3. Notice that I can’t win by selecting blue. Ok, so then I should choose red. Then two things happen: you select blue and I win 5 – 3 or you select red and we tie 1 – 1. So there is a possibility of winning if I select red. In fact, by selecting red I’m guaranteed to do better vs. either of your strategies. It seems logical that we’ll both arrive at the same conclusion and select red, therefore tying 1-1.

I think it's pretty clear in this situation neither of us would select blue, but what if we change the game from winner-take-all to point accumulation? There is no winner or loser per se, but you’re better off the more points you have – and not only relative to your opponent but on an absolute basis (you'd rather have 1 point than 0 regardless of what your opponent scores). Does the outcome change?
Well…it doesn’t? If I select blue, there are two things that can happen, you select red and I get 0 points. Not good for me, regardless of what you get – which is the max of 5 points, by the way. OK but if you also select blue we both get 3 points – pretty good! So what if I select red? Well, if you select blue I get 5 points, which is my best-case scenario. Enticing! If you also select red I only get 1 point. Notice that my payoff for selecting red is either 5 or 1, which is better  than the blue payoff of 3 or 0. Therefore, I should always select red, because no matter what you pick, red is the better payoff for me. You will follow suit, and we’ll both end up at 1 point each.
Blue
Red
Blue
me: 3, you: 3
0,5
Red
5,0
1,1
Wait a minute, why don’t we just talk to each other and agree to both select blue and end up with 3 points each? If we think it through using the logic above we end up with the same amount of points (1) as each other anyway, it might as well be more! Ok, we agree we’ll both select blue and end up with 3 points each. Satisfied? Our ability to communicate is why we’re so good at collaborating. Buuuuuut, cheating. If we both agree to go blue, the value that accrues to me if I cheat and switch to red is 2 points, a 67% increase! Add that I may have a suspicion of you cheating - in which case I’ll end up with 0 - and my propensity to chose red increases further, meaning we both end up back on red.

Prisoner's Dilemma


This is the most famous ‘game’ in game theory, named the prisoner’s dilemma. The dilemma part is that two rational actors will come to a conclusion where each of them make suboptimal choices. The original prisoner’s dilemma wasn’t based on choosing red or blue, but on considering two conspiring criminals that are captured and offered a choice: rat out your partner or keep quiet. If you rat out your partner and your partner keeps quiet, you go free and they spend 5 years in prison. If you both keep quiet you each spend 1 year in prison. If you keep quiet and your partner rats, you go to prison for 5 years and they go free. If you both rat on each other, you each get 3 years in prison. Same reasoning would apply to the game above. We should stay quiet and go to prison for 1 year each, but we will both end up ratting and going to prison for 3 years.

Maybe you’re thinking “interesting, but it doesn't apply in reality. That’s just not how the real world works.” But why ISN’T it how the real world works? The real world might not be as simple as this game, but cheating (or ratting or exploiting) can grant you an advantage. Consider the lion that doesn’t join the hunt - avoiding the risks - but shares in the spoils, or the shark that eats the pilot fish for a quick meal after its cleaning. Why aren’t these strategies more dominant? What we mostly see around us is cooperation. Are we simply irrational?

Reputation


There’s a key ingredient missing from the rationales above: reputation. Why do we feel the urge to do the honorable thing, to avoid being shamed, to want people to LIKE us? We feel that urge because our ancestor's who had that urge - and the urge to cooperate - had a better chance at survival. In real life, we don’t play isolated prisoner’s dilemma games with one other. The person we play the game with is probably the same person we’re going to play the game with later today, tomorrow, next week, or whatever.* If we continually cheat the individuals in our life people will stop playing the game with us. So what is the rational strategy when playing the prisoner’s dilemma over and over?

In 1980, a fascinating demonstration was devised by University of Michigan professor of political science Robert Axelrod to find the optimal strategy for the iterated prisoner’s dilemma. Axelrod's idea was each strategy would play against an opposing strategy 200 times in a type of round robin tournament within a computer. Importantly, the strategies would be able to take into account the outcomes of the previous interactions with their given opponent. The payout table was the same as I laid out in the example above except instead of ‘blue’ Axelrod called it cooperate – because both players selecting blue is the best combined outcome, and ‘red’ defect because defecting takes advantage of the other player. The winner would be the strategy to accumulate the most points.

The highest score you could achieve against a single opponent was 1000 (200 rounds multiplied the highest possible score of 5). The extreme strategies are always defect, meaning choose defect for all 200 games, and always cooperate, meaning choose to cooperate in all of the 200 games. If these two strategies played each other, always defect would win each round 5 – 0 and accumulate 1000 points while always cooperate would accumulate 0. Each strategy was scored on how it fared against itself, a strategy that chose at random, and all other submitted strategies.

How would you program your strategy? Certainly you could always cooperate in the hopes that your opponent will return the favor and you both end up at 600 points. But if you know your opponent will always cooperate, why not throw a defect in every once in a while to try to maximize your score? And if your opponent is a real sucker, throw in a few more defects. The problem with that, of course, is if your opponent retaliates by also defecting – so they can get 1 point instead of 0 – you are both worse off. Walking this line is the key to victory. So, which strategy won?

Tit For Tat


A quite simple one, it turns out. Tit for tat was submitted by Anatol Rappaport and it was programmed like this: cooperate on the first round and then do whatever the opponent did the previous round for each round after the first. Hmm, a real copycat. Why is that effective? Well lets match tit for tat up against the simplest strategies to get a feel for how it works (as it turns out, the simple strategies were not submitted). Against an always defect strategy, tit for tat cooperates and loses on the first round, but then defects for the rest of the remaining 199 games and doesn’t get taken advantage of, ending up with 199 points while always defect scores 204. Against an always cooperate strategy, tit for tat cooperates on the first round, gaining each player 3 points, and then cooperates from there on out, so both end up at 600 points. Against strategies that may try to defect at random in order to gain a few extra points, tit for tat responds by defecting on the next turn, either capturing a bigger payout when the opponent returns to cooperating, or at least avoiding another 0 if the opponent defects again. You can summarize this strategy as “I will cooperate, but if you try to cheat me I will immediately retaliate until you prove you will get back to cooperating.”

It's How You Play The Game


Based on these results, we can say that cooperation exists because the players (as a single system) gain more points by cooperating and therefore would out compete rival groups that do not cooperate. Think about a sports analogy: a balanced team - say you and me with 3 points of skill each - versus an unbalanced team - our opponents have one player with 5 points of skill and one with 0 - has a better chance at winning. Or thinking of it another way, maybe you and I both get 3 points of enjoyment in a loss, while the other team gets 5 points and 0 points. We collectively had a better time even though we lost.

By working together as a team and propping up weaker players we can optimize our performance. Presumably this is why we say that how you play the game is more important than simply winning. The saying hasn't been passed down just to make somebody - typically a child - feel better about themselves for losing. It has been passed down because there is much truth to it. Winning or losing a single game isn't as important in the grand scheme of life as being the type of person others want to continue to play with. Additionally, if you're known as a team player and not a ball hog, you'll get invited to play more often because you help bring the total enjoyment of the group up.

Those who cooperate over the long run will do better than those who sell out to win one game. The lions that all cooperate on the hunt can bring down bigger prey and all eat more, the sharks that don’t eat their pilot fish miss out on quick payoffs but avoid succumbing to parasites and live long enough to find other food sources, and the countries of the world mostly** cooperate to trade goods to enrich everyone and keep peace instead of annihilating one another. Cooperation pays.

Read more:

There are many books that discuss these concepts, but one of my favorites and a huge contributor to this post is Prisoner's Dilemma by William Poundstone, which along with discussing game theory and its origins applies the strategies to the Cold War. 

*Or at least this is how it was throughout most of human history when we primarily lived in small groups. You can see why this cooperation dynamic breaks down when you introduce encounters with people you are unlikely to encounter again (hello, internet) or if you can hide your identity to avoid spoiling your reputation (hello again, internet).

**There are obviously many examples of defections throughout history!


The Great Man Theory Is Not A Great Theory

The other day, while reading the local newspaper, I came across an article titled Living With Autism Is Awesome! The author described how au...