Link: https://leetcode.com/problems/stone-game-ii

Solution:

Topics: DP, game theory, Stone game

Intuition
This problem builds on Stone game, but I think the difficulty is greatly increased. These “optimal play” patterns are not the most intuitive for me but I’ll do my best. The problem statement is also not very clear in my opinion.

Lets start by establishing what “optimal play” means in this context. Optimal play simply means, not choosing a move that guarantees a loss (even if that loss occurs sometime in the future)…unless all remaining options will lose. How do we know which move does not guarantee a loss? We don’t, so we must try all the moves available to us (the subproblems are overlapping, so there is room for optimization).

With that established, lets get into the algorithm. If both players must play optimally, then by definition, the decision making should be the same for both. In the context of a recursive DP function, this means that the turns are swapped. This was very easy to do implicitly in Stone game, but this problem is quite a bit different so for simplicity we will pass player turn into our recursive function with a toggle variable.

The other thing we must handle is score. Since keeping both player scores is computationally intensive (and not very optimizable), we will keep track of score as a single quantity. Player 1 adds to the quantity, and player 2 subtracts. At the end we convert this quantity into a score with the operation (sum(piles) + quantity) // 2. This works because the quantity is essentially how many stones more or less you have collected compared to your opponent, and since at the end of the algorithm all stones have been collected by both players, we can derive the formula as follows:

1. total = your_score + opponent_score
2. opponent_score = total - your_score

3. quantity = your_score - opponent_score
4. your_score = opponent_score + quantity


Now plug formula 2 into formula 4:

1. your_score = total - your_score + quantity
2. 2*(your_score) = total + quantity
3. your_score = (total + quantity) / 2

Formula 3 now just becomes sum(piles) + quantity // 2

The last step to handle m. Simply compute x, to take from 1 to x piles, and each time pass max(m, piles_you_took) into the recursive function.

Implementation

def stone_game2(piles):
	@cache
	def dfs(i, m, player):
		if i == len(piles):
			return 0
			
		max_piles = m*2
		_max = float('-inf') if player == 0 else float('inf')
		take = 0
		for p in range(i, min(i + max_piles, len(piles))):
			take += piles[p]
			taken = p - i + 1
			if player == 0:
				_max = max(_max ,take + dfs(p+1, max(m, taken), player+1))
			else:
				_max = min(_max ,-take + dfs(p+1, max(m, taken), player-1))
		return _max
		
	return (dfs(0, 1, 0) + sum(piles)) // 2
				
#time: o(n*m*m) note that im not counting player variable here.
#memory: o(n*m) 

Review 1
I figured out the right approach almost instantly and quickly coded up a solution…and a simpler one than the above. Unfortunately, I initialized max_piles in the recursion to 0…and since I used a balance to keep track of who has what, this precludes the possibility of max_piles dipping into negatives…which is fatal if you are using a balance (it would be negative whenever player1 ends up with less than player 2). So I spent like an hour debugging the recursion, somehow not noticing this fatal flaw, however I can’t be too unhappy since apart from this stupid oversight, I coded this in 5 minutes and far simpler than the above.

Also, I looked at some of the solutions and everyone seems to be fixated on prefix sums. There is literally no point of doing this since you have to iterate from 1-(m*2) anyway. All they have done is increased the memory complexity.

Implementation

def stone_game2(piles):
	@cache
	def dfs(i, m):
		if i == len(piles):
			return 0
		balance = float('-inf')
		curr_stones = 0
		for j in range(i, min(i+(m*2), len(piles))):
			curr_stones += piles[j]
			balance = max(balance, curr_stones - dfs(j+1, max(m, (j-i+1))))
		return balance
	return (sum(piles) + dfs(0, 1)) // 2
		

review