It's been another big month for us here at Grai, with a couple of great changes we are excited to tell you about.
So without further ado, let's get this rolling.
Automated BigQuery Lineage Inference
Data warehouses, unlike traditional relational databases, usually don't enforce database constraints like foreign keys, which can make it difficult to automatically extract data lineage from the warehouse. To build out a complete column level data lineage graph, those relationships have to be sourced from other places like dbt, Fivetran, Airflow, or manual curation.
However, BigQuery provides a set of APIs that expose logs of the various queries being performed in the warehouse. Grai can now take advantage of those logs to infer lineage from BigQuery data warehouses. To enable log parsing in your connections, just check the "Log Parsing" box in the connection settings (and optionally provide a log parsing window, for example 20 days).
We'll have a blog post coming out in the next few weeks describing the technical aspects behind this new feature and how you can take advantage of these same capabilities in your company, so stay on the lookout for further updates.
🗺️ Getting the Big Picture
Unlike Grai, most data lineage tools provide a somewhat stripped-down view of your data lineage. Rather than a single view which can be zoomed in and out covering the entire space of your lineage, you're often restricted to the lineage of a single table or column at a time. This poses a real challenge if you don't already know what you're looking for. Doubly so for junior engineers and analysts who need help to find and understand the way data is used in the broader stack.
Alright, the obvious question would be: if this is so valuable, why is it so uncommon? The short answer: performance. Fetching and rendering large and complicated graphs with tens or hundreds of thousands of nodes is hard.
If you've been using Grai with a large graph, you may have even experienced some of the slowdown that comes with viewing larger lineage graphs. If so, we've got exciting news because this month saw a massive overhaul of the graph view, which should keep you scrolling and zooming no matter how large your organization's data stack.
If you're using the cloud, everything has already been updated, but if not, you should update your deployed Docker images to :latest to take
In Other News
- 🎉 Grai is an official dbt partner company
- Checkout our guide to using Fivetran's internal metadata API's
If you haven't already we'd love your feedback on product hunt, so take a look!
Just For Fun 😊
About Gra.io - We're an open source data observability tool designed to help catch issues in testing not in production. You can find us on twitter @Grai_io or shoot the founders an email firstname.lastname@example.org.