Sunday, January 6, 2019

Newcomb's Problem, Choices, and Perspective

How effective are we at making decisions and evaluating the decisions of others? Undoubtedly you’ve been in the position where you think the correct choice is obvious, or you’re in a situation where you can’t understand the choices of others. How often do you thoroughly evaluate the rationale of different choices? Maybe not often enough. Sometimes we may not evaluate other options because on the surface we see absolutely no merit to them. So it might be helpful to review a thought experiment where one choice may seem obvious to us, but the other option is just as reasonable. The thought experiment below was created by William Newcomb and thoroughly analyzed by Robert Nozick in his 1969 article titled Newcomb’s Problem and Two Principles of Choice.


Newcomb’s Problem


There are two boxes, red and blue. You are given two options: (1) take both the red and blue boxes; or (2) take only the blue box. The red box always contains $1,000. The blue box contains either $0 or $1,000,000. The contents of the blue box will be decided by a being called the Swami. The Swami has accurately predicted the choices of everybody to make this choice in the past. You are aware of this, the Swami is aware you are aware of this, and so on. You therefore trust the Swami to accurately predict the choice you will make. Now, here’s how the Swami will decide whether or not to put $1,000,000 in the blue box: If the Swami predicts you will choose option 1 (take both boxes), it puts $0 in the blue box. If the Swami predicts you will choose Option 2 (take only the blue box), it puts $1,000,000 in the blue box. The order of operations is 1) the Swami makes a prediction, 2) The Swami either puts $1,000,000 in the blue box or does not, and 3) you make your choice. What do you do?

The boxes can be in one of two states, depending on what the Swami predicts you will do. 


The Swami predicts you will take both boxes
The Swami predicts you will take only the blue box

Assuming you are interested in maximizing your payout, let’s examine both options. At first you may only see one clear option and don’t initially see why somebody would chose the alternative.



Take the blue box only


If the Swami is such an accurate predictor then the smart decision is to take only the blue box. The Swami will have predicted that you do this, placed $1,000,000 in the blue box, and you will be $1,000,000 richer. Likewise, if you take both boxes, the Swami would have predicted this and not placed the $1,000,000 in the blue box. You will therefore only get $1,000. While it helps to ascribe supernatural talents to the Swami - say its a genie, or a god, or a super powerful artificial intelligence - we don’t need to. For instance, as long as we know many people who have had to make this choice before us and all of the ones who chose both boxes ended up with $1,000 and those who selected only the blue box ended up with $1,000,000, we should factor this into our decision. Why take the chance of bucking the trend?



Take Both Boxes


The Swami decides to place the $1,000,000 or nothing in the blue box before you make your choice. Therefore, when it comes time to make a selection, the $1,000,000 is already in the blue box or it is not. No matter what I do the contents of the box do not change once the Swami makes its choice. Therefore, I maximize my payout by selecting both boxes. Why? If there is $1,000,000 in the blue box and I take both boxes I get $1,001,000 instead of only $1,000,000 by selecting just the blue box. If there is no money in the blue box I get $1,000 for selecting both boxes instead of zero for selecting just the blue box. Think of it this way: an external observer who can see what is in both boxes after the Swami has decided whether or not to place the $1,000,000 will always advise you to select both boxes because the total amount is always greater in both boxes.



The Right Choice


It may comfort you - or not - to know that answers to Newcomb’s paradox are typically pretty split among the general public. In 2016 The Guardian presented a poll to readers and the results after 31,854 votes were 53.5% blue box and 46.5% both boxes. Robert Nozick himself claimed to have put the problem to many students and friends with decisions split almost evenly. It seems 50 years after Nozick popularized the problem we still lack a consensus right answer.

While Newcomb’s problem mobilizes philosophers to ponder free will and determinism and the decision theorists to squabble over the right choice, it has application for us in everyday life. Newcomb’s problem introduces a scenario where one choice seems obvious to us at first but upon closer inspection there are real merits to the other side. It teaches us that it is important to keep all perspectives in mind when trying to decipher the world and the way things work.

-->

Saturday, December 8, 2018

The Prisoner’s Dilemma, Reputation, and Tit For Tat

Lions hunt in packs. Pilot fish eat parasites off sharks without fear of those nasty teeth. Countries mostly cooperate with each other and don't constantly threaten total nuclear war. Why is there so much cooperation rather than unbridled selfishness?

A simple game can help us out here. Let’s play against each other. Here are the rules:
  1. Each of us selects one color - red or blue - and writes it down.
  2. Our scores will be calculated based on the combination of choices:
    • I select blue, you select blue: 3 points each
    • I select blue, you select red: 0 points or me, 5 points for you
    • I select red, you select red: 1 point each
    • I select red, you select blue: 5 points for me, 3 points for you
  3. Whichever one of us has the most points wins.
What’s your strategy? Let’s evaluate the options. If I select blue, there are two things that can happen. You select red and I lose 0 – 5. Not good for me. If you select blue, we tie 3 – 3. Notice that I can’t win by selecting blue. Ok, so then I should choose red. Then two things happen: you select blue and I win 5 – 3 or you select red and we tie 1 – 1. So there is a possibility of winning if I select red. In fact, by selecting red I’m guaranteed to do better vs. either of your strategies. It seems logical that we’ll both arrive at the same conclusion and select red, therefore tying 1-1.

I think it's pretty clear in this situation neither of us would select blue, but what if we change the game from winner-take-all to point accumulation? There is no winner or loser per se, but you’re better off the more points you have – and not only relative to your opponent but on an absolute basis (you'd rather have 1 point than 0 regardless of what your opponent scores). Does the outcome change?
Well…it doesn’t? If I select blue, there are two things that can happen, you select red and I get 0 points. Not good for me, regardless of what you get – which is the max of 5 points, by the way. OK but if you also select blue we both get 3 points – pretty good! So what if I select red? Well, if you select blue I get 5 points, which is my best-case scenario. Enticing! If you also select red I only get 1 point. Notice that my payoff for selecting red is either 5 or 1, which is better  than the blue payoff of 3 or 0. Therefore, I should always select red, because no matter what you pick, red is the better payoff for me. You will follow suit, and we’ll both end up at 1 point each.
Blue
Red
Blue
me: 3, you: 3
0,5
Red
5,0
1,1
Wait a minute, why don’t we just talk to each other and agree to both select blue and end up with 3 points each? If we think it through using the logic above we end up with the same amount of points (1) as each other anyway, it might as well be more! Ok, we agree we’ll both select blue and end up with 3 points each. Satisfied? Our ability to communicate is why we’re so good at collaborating. Buuuuuut, cheating. If we both agree to go blue, the value that accrues to me if I cheat and switch to red is 2 points, a 67% increase! Add that I may have a suspicion of you cheating - in which case I’ll end up with 0 - and my propensity to chose red increases further, meaning we both end up back on red.

Prisoner's Dilemma


This is the most famous ‘game’ in game theory, named the prisoner’s dilemma. The dilemma part is that two rational actors will come to a conclusion where each of them make suboptimal choices. The original prisoner’s dilemma wasn’t based on choosing red or blue, but on considering two conspiring criminals that are captured and offered a choice: rat out your partner or keep quiet. If you rat out your partner and your partner keeps quiet, you go free and they spend 5 years in prison. If you both keep quiet you each spend 1 year in prison. If you keep quiet and your partner rats, you go to prison for 5 years and they go free. If you both rat on each other, you each get 3 years in prison. Same reasoning would apply to the game above. We should stay quiet and go to prison for 1 year each, but we will both end up ratting and going to prison for 3 years.

Maybe you’re thinking “interesting, but it doesn't apply in reality. That’s just not how the real world works.” But why ISN’T it how the real world works? The real world might not be as simple as this game, but cheating (or ratting or exploiting) can grant you an advantage. Consider the lion that doesn’t join the hunt - avoiding the risks - but shares in the spoils, or the shark that eats the pilot fish for a quick meal after its cleaning. Why aren’t these strategies more dominant? What we mostly see around us is cooperation. Are we simply irrational?

Reputation


There’s a key ingredient missing from the rationales above: reputation. Why do we feel the urge to do the honorable thing, to avoid being shamed, to want people to LIKE us? We feel that urge because our ancestor's who had that urge - and the urge to cooperate - had a better chance at survival. In real life, we don’t play isolated prisoner’s dilemma games with one other. The person we play the game with is probably the same person we’re going to play the game with later today, tomorrow, next week, or whatever.* If we continually cheat the individuals in our life people will stop playing the game with us. So what is the rational strategy when playing the prisoner’s dilemma over and over?

In 1980, a fascinating demonstration was devised by University of Michigan professor of political science Robert Axelrod to find the optimal strategy for the iterated prisoner’s dilemma. Axelrod's idea was each strategy would play against an opposing strategy 200 times in a type of round robin tournament within a computer. Importantly, the strategies would be able to take into account the outcomes of the previous interactions with their given opponent. The payout table was the same as I laid out in the example above except instead of ‘blue’ Axelrod called it cooperate – because both players selecting blue is the best combined outcome, and ‘red’ defect because defecting takes advantage of the other player. The winner would be the strategy to accumulate the most points.

The highest score you could achieve against a single opponent was 1000 (200 rounds multiplied the highest possible score of 5). The extreme strategies are always defect, meaning choose defect for all 200 games, and always cooperate, meaning choose to cooperate in all of the 200 games. If these two strategies played each other, always defect would win each round 5 – 0 and accumulate 1000 points while always cooperate would accumulate 0. Each strategy was scored on how it fared against itself, a strategy that chose at random, and all other submitted strategies.

How would you program your strategy? Certainly you could always cooperate in the hopes that your opponent will return the favor and you both end up at 600 points. But if you know your opponent will always cooperate, why not throw a defect in every once in a while to try to maximize your score? And if your opponent is a real sucker, throw in a few more defects. The problem with that, of course, is if your opponent retaliates by also defecting – so they can get 1 point instead of 0 – you are both worse off. Walking this line is the key to victory. So, which strategy won?

Tit For Tat


A quite simple one, it turns out. Tit for tat was submitted by Anatol Rappaport and it was programmed like this: cooperate on the first round and then do whatever the opponent did the previous round for each round after the first. Hmm, a real copycat. Why is that effective? Well lets match tit for tat up against the simplest strategies to get a feel for how it works (as it turns out, the simple strategies were not submitted). Against an always defect strategy, tit for tat cooperates and loses on the first round, but then defects for the rest of the remaining 199 games and doesn’t get taken advantage of, ending up with 199 points while always defect scores 204. Against an always cooperate strategy, tit for tat cooperates on the first round, gaining each player 3 points, and then cooperates from there on out, so both end up at 600 points. Against strategies that may try to defect at random in order to gain a few extra points, tit for tat responds by defecting on the next turn, either capturing a bigger payout when the opponent returns to cooperating, or at least avoiding another 0 if the opponent defects again. You can summarize this strategy as “I will cooperate, but if you try to cheat me I will immediately retaliate until you prove you will get back to cooperating.”

It's How You Play The Game


Based on these results, we can say that cooperation exists because the players (as a single system) gain more points by cooperating and therefore would out compete rival groups that do not cooperate. Think about a sports analogy: a balanced team - say you and me with 3 points of skill each - versus an unbalanced team - our opponents have one player with 5 points of skill and one with 0 - has a better chance at winning. Or thinking of it another way, maybe you and I both get 3 points of enjoyment in a loss, while the other team gets 5 points and 0 points. We collectively had a better time even though we lost.

By working together as a team and propping up weaker players we can optimize our performance. Presumably this is why we say that how you play the game is more important than simply winning. The saying hasn't been passed down just to make somebody - typically a child - feel better about themselves for losing. It has been passed down because there is much truth to it. Winning or losing a single game isn't as important in the grand scheme of life as being the type of person others want to continue to play with. Additionally, if you're known as a team player and not a ball hog, you'll get invited to play more often because you help bring the total enjoyment of the group up.

Those who cooperate over the long run will do better than those who sell out to win one game. The lions that all cooperate on the hunt can bring down bigger prey and all eat more, the sharks that don’t eat their pilot fish miss out on quick payoffs but avoid succumbing to parasites and live long enough to find other food sources, and the countries of the world mostly** cooperate to trade goods to enrich everyone and keep peace instead of annihilating one another. Cooperation pays.

Read more:

There are many books that discuss these concepts, but one of my favorites and a huge contributor to this post is Prisoner's Dilemma by William Poundstone, which along with discussing game theory and its origins applies the strategies to the Cold War. 

*Or at least this is how it was throughout most of human history when we primarily lived in small groups. You can see why this cooperation dynamic breaks down when you introduce encounters with people you are unlikely to encounter again (hello, internet) or if you can hide your identity to avoid spoiling your reputation (hello again, internet).

**There are obviously many examples of defections throughout history!


Newcomb's Problem, Choices, and Perspective

How effective are we at making decisions and evaluating the decisions of others? Undoubtedly you’ve been in the position where you think the...