Once I joined Postman’s knowledge staff somewhat over a yr in the past, our knowledge was largely a thriller to me. Day-after-day, I’d submit questions on Slack like “The place can I discover our MAU (month-to-month lively customers)?” Somebody would inform me the place to get it, however as I dug additional, I’d discover MAU knowledge in different areas. And typically the totally different areas contradicted one another.
Over time, I discovered learn how to navigate Postman’s wealth of knowledge—which tables had totally different variations of the identical knowledge, or totally different filters, or sync points. However this didn’t cease with me. As the info staff scaled almost fivefold in a single yr, this problem got here up repeatedly with every new staff member.
On the time, Postman’s knowledge system was pretty easy. We had a set of knowledge tables, and details about these tables lived within the heads of our early knowledge staff members. This labored when the corporate and its knowledge have been small, however it couldn’t sustain as we began to develop exponentially.
Postman at the moment has a whole lot of staff members distributed throughout 4 continents, and greater than 17 million customers from 500,000 firms utilizing our API platform.
From the beginning, Postman Co-founder and CTO Ankit Sobti needed to make it possible for knowledge was democratized. He used to say that it’s troublesome for a knowledge staff to sit down and churn insights day in and time out. As a substitute, he staunchly believes, everybody within the firm ought to be capable to entry our knowledge and achieve insights from it. This turned particularly essential in 2020 when Postman continued to scale whereas going totally distant throughout the COVID-19 pandemic.
To handle this problem, the info staff and I made a decision to tackle Postman’s knowledge system as a undertaking final yr. Our purpose was to make Postman’s knowledge simpler to entry and perceive, each for brand spanking new hires inside the knowledge staff and for folks throughout the corporate.
Modernizing and democratizing a large-scale knowledge system is an enormous problem, and we’re positively not the one firm making an attempt to crack it. So, within the hopes that our expertise might assist others making an attempt to take care of the identical challenges, I now wish to share how we went about this undertaking, what labored and what didn’t, and what we’ve discovered thus far.
The place we began—the challenges of Postman’s knowledge stack
At Postman, we’ve carried out a contemporary knowledge stack. Information engineers convey knowledge into Redshift, Amazon’s cloud knowledge warehouse. Then our analysts rework the info with dbt, our SQL engine, and create dashboards and Explores on Looker.
At the moment, we’ve about 170 lively customers per week on Looker. That’s so much for an organization of round 400 folks, however it’s not but attaining our purpose for everybody to have the ability to use our knowledge.
One of many essential points we have been going through was the shortage of consistency when offering context round knowledge—making context the lacking layer in our knowledge stack. As Postman grew, it turned troublesome for everybody to grasp and, extra importantly, belief our knowledge.
We had been creating dashboards and visualizations based mostly on requests from throughout the corporate: no matter folks wanted, we designed. Nonetheless, the metrics on these dashboards typically overlapped, so we had inadvertently created totally different variations of the identical metrics. When it wasn’t clear how every metric was totally different, we misplaced folks’s belief. (And because the saying goes: constructing belief is difficult, however dropping it’s straightforward—it simply takes one mistake.)
The information staff’s Slack channel was filling up with questions from different groups asking us issues like “The place is that this knowledge?” and “What knowledge ought to I take advantage of?”
Our skilled knowledge analysts spent hours every week fielding these questions from new hires. In fact, this was irritating for all concerned. However we additionally realized there was a bigger downside—it will be a catastrophe if any of our analysts left the corporate, since a lot data was saved of their heads.
In our knowledge staff’s dash retrospectives, we realized that Postman’s knowledge system wanted assist, so we launched into a undertaking to democratize our knowledge and repair discoverability. Our purpose was to create extra time for our staff and extra belief inside the firm.
Answer #1: Documenting our knowledge with Confluence
As a substitute of embarking on an enormous overhaul of our knowledge system, we determined to start out smaller, implement an answer rapidly, and see what we may study from it. Postman was already utilizing Atlassian, so we began by making a Confluence doc.
Earlier than, all of our knowledge questions and solutions have been saved in Slack. Slack may be onerous to navigate and search, so folks have been asking the identical questions time and again. It’s straightforward sufficient to reply one or two questions on Slack, however 20 or 100? It’s simply not scalable.
Going ahead, our purpose was to make our new Confluence doc a single, searchable supply of reality.
Every time one thing got here up a number of instances on Slack, we put it on Confluence. For instance, when somebody requested, “How do you calculate MAU?” we added the desk and calculations to the doc. When a number of folks requested us for a similar metrics, we additionally added these stats and charts.
Answer #2: Creating a knowledge dictionary with Google Sheets
Our Confluence doc was a superb begin, however like Slack, a single doc simply couldn’t scale as rapidly as we have been. Our subsequent thought was to create a knowledge dictionary in Google Sheets.
This appeared pretty easy. We first bought all our desk, schema, and column names in a single place. Then, for just a few sprints, we assigned everybody in our knowledge staff to doc 5 tables every. Every individual put aside a few hours to put in writing down every little thing they knew about their knowledge tables within the Google Sheet.
We additionally included opinions on this course of. After every individual documented their tables, another person within the knowledge staff would learn via their work. If it appeared clear, they’d say it was good to go.
It was a good suggestion, however we bumped into challenges executing it:
- Low-quality documentation: On the time, our knowledge staff had almost 20 folks in it, however solely three or 4 of them had been with Postman for greater than a yr. These veteran staff members couldn’t doc all of our knowledge, so everybody chipped in. Nonetheless, a number of the individuals who have been documenting our knowledge didn’t really know a lot about it. They hadn’t arrange the info, and so they weren’t the proprietor of the info desk. Our newer staff members would add no matter they understood, however it didn’t all the time give a whole image of the info desk.
- The brand new knowledge dictionary additionally had bother with scale: We had almost 20 knowledge staff members making an attempt to work on the doc. With that many individuals writing, modifying, and commenting on the similar time, it rapidly turned an excessive amount of to deal with on a Google Sheet. And that was simply the info staff. We needed to ultimately open the info dictionary to all the firm, however we couldn’t work out learn how to hold our documentation safe and tamper-proof with a whole lot of customers.
Answer #3: Implementing a pre-built knowledge workspace with Atlan
After making an attempt to construct our personal answer twice, we began to search for a pre-existing product that we may undertake. That’s once we discovered Atlan, a contemporary knowledge workspace, which appeared like a transparent answer for our data-discovery issues.
On Atlan, we’ve been capable of catalog and doc all of our knowledge, and its catalog acts as a single supply of reality for our knowledge. The catalog contains a number of ranges of permissions for several types of customers inside and outdoors of the info staff, so everybody can seek for and entry knowledge with out having to message the info staff.
The outcome? Everybody is ready to discover the appropriate knowledge for his or her use case, and the info is constant throughout the board for all accessing it. The clearest final result is that everybody is lastly speaking about the identical numbers, which helps us rebuild belief in our knowledge. If somebody says that our development is 5%, it’s 5%.
Our new knowledge workspace has been successful for us due to its clear interface and highly effective functionalities—offering documentation, possession data and utilization, and knowledge discovery.
Shifting past knowledge discovery
At Postman, we wanted to deal with points round knowledge discovery and context as a result of they turned main issues as we scaled. However as we’ve solved these issues, we realized that we’ve inadvertently set ourselves so far tackle larger knowledge challenges.
For instance, now that we’ve arrange a system to trace our knowledge, we are able to use it to grasp our knowledge lineage—together with the place every knowledge asset comes from, and the way they’re all related to one another.
Information lineage may be actually helpful for a few causes. First, understanding how our knowledge is related helps us clear up our every day bugs and points faster.
Second, we’ve discovered that having a single supply of reality for our knowledge helps our knowledge staff as we develop. Each time we alter one thing or add one thing new, it’s essential to verify the way it will have an effect on every little thing else in our knowledge system. As a substitute of posting a query on Slack, we are able to verify knowledge lineage and discover every little thing we have to change or replace.
Lineage is only one avenue that we’ve seen open up as we’ve improved the best way we doc and catalog our knowledge. We’re additionally taking steps towards sustaining extra constant knowledge descriptions throughout instruments, enhancing our knowledge high quality, and extra.
In the long term, we see work on enhancing knowledge discovery as a basis for democratizing knowledge throughout the corporate. Having a dependable knowledge basis, the place folks can discover and perceive all our knowledge, opens the potential of having everybody take part in analyzing knowledge. This could allow our total firm to develop into extra data-aware and data-driven, which is the purpose for any main firm at this time.
What we discovered from the final yr
As I feel again on our efforts to enhance our knowledge stack, there are just a few learnings that stick out—issues that we managed to get proper the primary time round, and likewise issues that I want we had performed otherwise:
- Begin small and construct to greater options: If you’re taking up an enormous knowledge problem, it’s straightforward to leap to serious about equally huge options that take plenty of sources or cash. Nonetheless, beginning with smaller, faster options helped us perceive extra about what works and doesn’t work for us. Then, once we ventured right into a extra complete, paid answer, we knew precisely what options we have been searching for.
- Take note of scale: Our first two options have been stable concepts, however they didn’t sustain with our staff and knowledge as we scaled. That’s why it’s essential to consider how a knowledge product will scale as your organization and knowledge scales. Will it sustain in case your knowledge grows by 100x in a yr? Can it deal with plenty of customers, all with totally different wants and entry ranges?
- One enchancment unlocks others: Fixing one a part of your knowledge stack lays the inspiration for larger enhancements. Information cataloging and documentation aren’t significantly “enjoyable,” however they’re essential for anything chances are you’ll wish to do together with your knowledge. In any case, you’ll be able to’t tackle ML, AI, and the opposite newest knowledge buzzwords in case you don’t perceive your knowledge.
This text was initially revealed on the Postman Weblog. The writer, Prudhvi Vasa, is the Analytics Chief at Postman.