Any individual player, conditional on that player making a guess, can be correct at most 50% of the time. So, intuitively, how is it that we can do better than 50%?
Simple: imagine we've already decided on player 1's strategy and we're now trying to choose player 2's strategy. If we can arrange it so that player 2's wrong guesses coincide with player 1's wrong guesses, whereas player 2's correct guesses happen at times that player 1 wouldn't have attempted to guess, we can increase the number of outcomes where at least one correct guess happens, without increasing at all the number of outcomes where an incorrect guess happens. As long as we can do this efficiently enough with each subsequent player, it's not hard to see why in theory we might be able to beat 50%.
There are several other classic puzzles with the same concept as this one. In all of them, the key is to choose strategies for the players that cause wrong guesses to be as densely concentrated as possible and correct guesses to be as disjoint as possible. The "bridge" strategy is very effective at this - in the cases that anyone guesses wrongly, 3 people all do so, whereas in the cases that anyone guesses correctly, there is only 1 person doing so. This results in there being 3 hat combinations guessed correctly for every 1 hat combination guessed wrongly, and since this strategy also always has someone guess, the winning percentage is precisely 3:1 = 75%.
A little bit of thought along these lines reveals that since at most 4 players can guess wrongly at once, an immediate upper bound for the best possible winning percentage for any strategy is 4:1 = 80%.
For this specific problem, there is also an interesting geometric visualization. Take all of the hat combinations to be vertices of a graph where there is an edge between two vertices if they differ in exactly one hat. In the 3-person case, this graph is simply a cube. In the 4-person case, it's a hypercube.
Saying that someone will guess a particular color for their own hat when they see a certain combination of other hats is equivalent to choosing a single edge in the graph and marking one of the vertices for that edge as "wrong" and the other as "correct". The choice of edge corresponds to the the choice of person and combination that they see, while the choice of which vertex to mark wrong/correct corresponds to color of hat we want that person to guess in that case. The goal is to maximize the number of vertices marked "correct" at least once and marked "wrong" zero times.
Visualizing it like this, it's now easy to see that 75% is optimal. In general, we will consign some number of vertices to be marked "wrong", and then by choosing all outgoing edges, we can cause all adjacent vertices to them to be marked "correct". And it's clear for any fixed choice of vertices that we'd like to be the "wrong"-marked ones, doing this is the best we can do. Now, if we choose any 4 vertices to be marked "wrong" in such a way that all other vertices are adjacent to at least one of them, we get one of the 75% solutions, because we are correct on all but 4 out of 16 vertices. We can only potentially do better by initially choosing fewer vertices to be "wrong", but if we choose 3 or fewer vertices, then since each vertex has only 4 outgoing edges, we can make at most 12 vertices "correct", which is again only 75%.
Therefore, 75% is optimal.