Usually you discard extreme values to reduce noise, and in fact they wrote that's why they did it:
> We then take the fifth percentile as our final measurement, because we assume that most noise sources are one-sided (for example, cache misses, pre-emptions and so on). During training we process the measurements across ten machines for computational efficiency.
> I though we should always choose the fastest time when measuring performance.
Depends. For games you usually do sth similar to what they did - exclude small percentage of worst results to reduce influence of noise and then optimize the worst scenario to make the game run consistent and smooth.
One possibility which seems not so well-known is that clocks with per-core state might not be perfectly synchronized. If your initial measurement is from core0, then we migrate to core1, the end measurement could even be 'before' the initial.
Then there are manufacturing differences between cores that affect e.g. their leakage current and thus the (turbo) frequency at which they can run.
So the measurement noise is indeed not one-sided, that is to say: measurements are not always overestimates. Thus a trimmed mean on both sides is a good idea, and pinning threads to a core when measuring is also helpful.
> We then take the fifth percentile as our final measurement, because we assume that most noise sources are one-sided (for example, cache misses, pre-emptions and so on). During training we process the measurements across ten machines for computational efficiency.
> I though we should always choose the fastest time when measuring performance.
Depends. For games you usually do sth similar to what they did - exclude small percentage of worst results to reduce influence of noise and then optimize the worst scenario to make the game run consistent and smooth.