It's about having a consistent measurement baseline. Say you run your benchmark once, then thermal throttling kicks in, then you run it again, and it takes twice as long. Is your code actually slower now? Should I wait until the fan turns off before I run it again? That data is noisy and useless. Take your measurements on a server or desktop with sane thermals and a full-size fan.
If you speed things up by 10% on your server, they'll get 10% faster on your laptop as well.
Yes, you have to be very careful with measurements, I agree.
> If you speed things up by 10% on your server, they'll get 10% faster on your laptop as well.
Depends on the speedup and techniques to achieve it. For example, speeding things up via more parallelism can lead to wall-clock improvements on servers but not laptops, precisely because the latter just end up doing more thermal throttling....
Ideally, you want to measure both ideal hardware and actual-user-hardware; often speedups on one will not be visible on the other and vice versa.
If you speed things up by 10% on your server, they'll get 10% faster on your laptop as well.