I’m going to share what I’m learning as I build out a GA4 data pipeline. and unfortunately I can’t share code.
Working with the endpoint
• once I figured out JWT authentication, working with the API has not been too difficult.
◦ JWT authentication is where a service account can impersonate any user inside your Domain by using something called domain wide delegation. JWT auth is :pinched_fingers::skin-tone-3: for orgs / agencies that have a bunch of different overlapping users that might have access to different properties inside GA4, GBP or GSC.
• the API is very fast. getting event data for a site with 8.3k visits / month for YTD took 1 API call and took < 3 seconds to respond.
• That returned 3600 rows of event data that corresponded to this client’s key events.
◦ This took up about 600kb in BigQuery.
• I was pretty concerned about GA4 token usage and it feels like I won’t need to be. That response used up *only 5 tokens* out of 200k tokens per day.
◦ I came up with a neat solution to the token problem by heavily filtering which eventNames we return in the response to match the events that correlate with the events that are our key events.
• I was also pretty concerned about how to query the data based on being limited to 9 dimensions per request
◦ I came up with kind of a cool solution which is to dynamically build the api call to use just the dimensions necessary to be able to properly bucket a user and to track different types of leads (calls, Form fills, chats).
• This means each client has a config stored in Google firestore with a list of event names to filter by and custom dimensions that align with Calls, Form fills, and chats.
• I’m gathering a list of default dimensions:
◦ firstUserSourceMedium
◦ eventName
◦ date
◦ firstUserCampaignName (this is sometimes helpful for identifying GBP traffic)
◦ pageLocation (so we can join GA + GSC data on landing page)
• I built the pipeline thus far using cursor / claude sonnet.
◦ normal list of annoyances while building something with AI.