Posts tagged data science

Analyzing GoGo 21 for Optimal Strategy

There are sites where you can play "casual" games against other people for money. One of the more interesting games, in my opinion, is inspired by blackjack.

On Worldwinner / GSN Cash Games it's called Catch 21 and on Royal Games it's GoGo 21. The gist is you get a standard deck of cards, and given a maximum of four concurrent "hands", you try to make as many 21's as possible. You can't go over, you can throw away cards, you get points for speed but get the most for accuracy.

Months ago, working on my Scrabble Boggle bot, I'd take breaks and play this game, trying to deduce optimal strategy. It's easy to play - what's the best way to play?

I recently ran simulations to figure it out.



For humans, the best way to play is trying to make 21 (duh-huh) or trying to make 11 if that's not possible. Otherwise put the cards wherever they can go. It actually helps to always consider the decks from left to right, instead of some random frantic order.

For computers, optimize the human strategy. You have the processing power and memory to constantly track which cards are most abundant in the deck. There are times when, if you can't make 21, you won't necessarily be trying to make 11. Example - there are two 7's left in the deck and 1 ten. It would be preferable to set up a 14 instead of an 11.

How to play

Still don't understand how to play? I think it's easiest to show. Here's a YouTube video someone playing GoGo 21.

Analysis, up to optimal human strategy

To initially analyze the game, I wrote a rough Python script to simulate it.

Then I just added parts and checks as I went along.

The first strategy I tried out is the simplest. Look at the first hand - as long as it's possible to add cards, do that. If you can't, look at the second hand. And so on, until you can't put a card anywhere, at which point you start to discard.

The metrics I've judged these strategies on are blackjacks per match and discards per match. Obviously - and these have a negative correlation - we want to maximize blackjacks and minimize throwaways.

All figures from here in are from 100,000 sim rounds.

  • 567,600 blackjacks = 5.676 / game
  • 1,889,862 throwaways = 18.89862 / game

Next, I was curious about a small tweak. Instead of methodically analyzing the open decks, left to right, what if we randomized that part?

Things actually turned out worse.

  • 563,862 blackjacks = 5.63862 / game
  • 1,911,062 discards = 19.11062 / game

At this point, I decided to use some more realistic strategy. I added a check to look at all the hands to see if we could make 21 and then put the card there if so.

  • 902,747 blackjacks = 9.02747 / game
  • 1,138,283 discards = 11.38283 / game

Now we're getting somewhere. I added another check after the one for 21, this time to check for 11. Since the most common card value is 10, the next best thing to 21 is usually 11.

  • 959,662 blackjacks = 9.59662 / game
  • 1,056,945 discards = 10.56945 / game

So that's the best strategy for humans, I'm quite certain.

Deducing optimal strategy for computers

Computers are better built to play this game. For my next strategy, after the 21 check, I'd calculate the most common card value* left in the deck then subtract it from 21. Whatever the result was, that's what we'd be shooting for.

* Note I always counted aces as being worth 11, because I was extra lazy in this prototype script.

Most of the time, yes, most common card is 10 so we still shoot for 11. But once in a blue moon it's not.

  • 961,034 blackjacks = 9.61034 / game
  • 1,056,246 discards = 10.56246 / game

At this point I decided to write out a more elaborate simulator than my initial hackjob. My ultimate goal was to expand on the check I described above, now considering the abundance of all cards left in the deck.

The better simulator is here on Github. Here are results from running what I deem optimal computer strategy.

  • 973,889 blackjacks = 9.73889 / game
  • 1,054,623 discards = 10.54623 / game

Just a little better, but so small it could've been lost in Monte Carlo run stats fuzz.

Ultimately the crack to a blackjack-inspired game is really just card counting. Big surprise, right?


One way to figure out a game's optimal strategy is by coding its basics, then testing strategies over thousands of runs.

That's what I did here. The initial script took maybe 5 minutes to write.

Obviously, all games aren't this simple. I don't think it's as easy to model Texas Hold'em.

Stay tuned for a possible followup post, with a bot. The trickiest part of that would be the image recognition. We'll see. 🙂

Cracking the Data Science Interview

While I'll probably never directly work as a data scientist, skills of that nature are becoming more prevalent throughout the tech scene. There was definitely a presence at Uncubed.

Nir Kaldero gave a solid list of interview questions on Quora, kind of an answer to "how can I crack the data science interview?"

As a study tool, here are answers and additional resources. Credit to Nir for the questions obviously. Reference links for everything are at the end of attributed statements.

Data Science Disciplines
(Image credit - Wikimedia)

Entry- or Mid-Level Position Interview Questions

What is P value?

The P value (or calculated probability) is the estimated probability of rejecting the null hypothesis H0 when that hypothesis is true. Usually the null hypothesis is one of "no difference" - like "no difference in blood pressure between groups A and B." - StatsDirect

Wikipedia entry on P value.

What is regularization? Which problem does regularization try to solve?

Regularization is tuning or selecting the preferred level of model complexity so your models are better at predicting (generalizing). If you don't do this your models may be too complex and overfit or too simple and underfit, either way giving poor predictions.

To regularize you need 2 things:

  1. A way of testing how good your models are at prediction, for example using cross-validation or a set of validation data (you can't use the fitting error for this).
  2. A tuning parameter which lets you change the complexity or smoothness of the model, or a selection of models of differing complexity/smoothness.

Basically you adjust the complexity parameter (or change the model) and find the value which gives the best model predictions. - Toby Kelsey

Wikipedia entry on regularization.

Quora page on regularization.

How can you fit a non-linear relationship between X and Y into a linear model?

As an example - X could be age, Y could be income.

What is the probability of getting a sum of 2 from 2 equally weighted dice?

To arrive at a sum of 2...

What about a sum of 4?

For a sum of 7...

What is gradient descent method?

"Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point." - Wikipedia

Read Eren Golge's answer

Wikipedia entry on gradient descent. Quora page on gradient descent.

Which clustering methods are you familiar with?

Cluster Analysis: Basic Concepts & Algorithms (pdf) - University of Minnesota

Statistical Clustering (pdf slides) - Texas A&M University

Wikipedia entry on cluster analysis.

Which libraries for analytics / data science are you familiar with in Python?

Probably the most prevalent: "pandas" Python data analysis library

9 Python Analytics Libraries - Data Science Central

Python for Data Analysis via O'Reilly

What is an eigenvalue?

To really understand eigenvalues, you need a grasp on eigenvectors.

"In linear algebra, an eigenvector or characteristic vector of a square matrix is a vector that does not change its direction under the associated linear transformation." - Wikipedia

Image credit - Wikipedia
“The eigenvalue… tells whether the special vector x is stretched or shrunk or reversed or left unchanged…" - MIT

Eigenvalues & Eignvectors (pdf) from MIT

Khan Academy: Introduction to eigenvalues & eigenvectors

Wikipedia entry on eignvalues and eigenvectors.

A question on Baye's Law

"Bayes' Law relates current probability to prior probability." - Wikipedia

(Image credit - Air & Space Power Journal)
Image credit - Air & Space Power Journal

A question of time series

If you have a data set with 100 observations for each Xi, and 3 lag-effect variables of X1, how many predictions will you have if you run any simple linear regression?

Advanced-Level Position Interview Questions

What is the difference in the outcome (coefficients) between the L1 and L2 norms?

Read Quote of YunFang Juan's answer to What is the difference between L1 and L2 regularization? on Quora

Wikipedia entry on regularization.

Quora page on regularization.

What is Box-Cox transformation?

"… Many real data sets are in fact not approximately normal. However, an appropriate transformation of a data set can often yield a data set that does follow approximately a normal distribution. … The Box-Cox transformation is a particularly useful family of transformations." - Engineering Statistics Handbook
Image credit - Online Stat Book

Engineering Statistics Handbook page on Box-Cox transformation.

Online Stat Book page on Box-Cox transformation.

Wikipedia entry on Box-Cox distribution.

What is multicollinearity?

Multicollinearity (also collinearity) is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy. - Wikipedia

It can cause strange results when attempting to study how well individual independent variables contribute to an understanding of the dependent variable. Investopedia

Examples of contexts in which multicollinearity arises - survival analysis, interest rates for different terms to maturity. Wikipedia

The simplest way to resolve multicollinearity problems is to reduce the number of collinear variables until there is only one remaining out of the set. - Investopedia

Quora page on multicollinearity.

Will gradient descent methods always converge on the same point?

Wikipedia entry on gradient descent.

Quora page on gradient descent.

Is it necessary that gradient descent methods will always find the global minima?

guide through the intuition and explain some of the math

Quora page on gradient descent.

Questions about natural language processing (NLP)

Wikipedia entry on natural language processing.

Quora page on natural language processing.

A question in combinatorics

Combinatorics is a branch of mathematics concerning the study of finite or countable discrete structures. Aspects of combinatorics include:

  • Counting the structures of a given kind and size, deciding when certain criteria can be met, and constructing and analyzing objects meeting the criteria.
  • Finding "largest", "smallest", or "optimal" objects.
  • Studying combinatorial structures arising in an algebraic context, or applying algebraic techniques to combinatorial problems. - Wikipedia

Quora page on combinatorics.

(Image credit - University of Miami)
(Image credit - University of Miami)

Bonus: Top 3 Algorithms for Data Scientists

Credit to William Chen, these are expanded upon from his Quora answer here.

Logistic Regression / Linear Regression

"For binary classification and regression."

Quora page on logistic regression.

Quora page on linear regression.

Random Forests

"For classification."

Quora page on random forests.


"For textual analysis."

Quora page on TF-IDF.

If you have no idea what Quora is, check it out and follow me. It's like a more intelligent Yahoo Answers.