The problem with accurate is that there isn't really 1 score for anything, no one LCP, CLS, INP score a URL has. It's variable on so many things, some of them not directly in your control, like latest flagship device on WiFi vs. budget android on Spotty 3g.
So what you need is representative.
CrUX is a good thing, especially if you have some decent traffic, so good URL level coverage.
But ideally you would want to collect your own real user metrics, I'll add in things like and as services that are simple to set up. If you already have something like new relic or sentry, you may have the ability to gather it through them too.
With your own data, you can get more fine-grained, like there's poor LCP, but only from certain countries etc.
If you have dev resources, point them to , you can get that sweet attribution data @Shawn Huber mentioned (it's not in the CrUX data, unfortunately)
Lab data, like gtmetrics is still super useful because it's a much more controlled environment, but really it's best to use it comparatively, i.e did the change you pushed increase or decrease that metric? But keep in mind you then need to look at the real user metrics to validate if it actually helped in the real world.