Again, I think they pursue the wrong way to do benchmarks. I created a thread on that topic but nobody replied.
That may because I am completely wrong or surprisingly it seems common that people do not think it through and have no idea what they are doing. In my opinion problems like that are very foreseeable.
SWE did the right approach in his last video (If I am not the one who is wrong, which I don’t know since nobody seems to like discussing it) but it is hard to compare it with previous results.