Sunday, June 14, 2026
HomeSoftware EngineeringInformation Lineage: Understanding Information Lineage at Scale with Julien Le Dem

Information Lineage: Understanding Information Lineage at Scale with Julien Le Dem

[ad_1]

Huge Information has exploded the previous decade as cloud computing and extra environment friendly {hardware} made scaling basically limitless. Merchandise like Uber revolve solely round analyzing knowledge to offer rides. In response to an EMC/IDC examine, there was roughly 5.2TB of knowledge for each individual in 2020. That estimate was made earlier than the transition to distant work, which doubtless makes it a lot increased. 

The time period “knowledge lineage” refers back to the assortment, origin, storage, switch, and use of knowledge over time. Given the scale of the Huge Information business and associated industries, sustaining an intensive knowledge lineage, even inside small corporations, might be very tough. It turns into particularly difficult at scale. What modern instruments make understanding all this data potential? Can knowledge actually proceed rising at this fee?

On this episode we discuss with Julien Le Dem, CTO and Co-Founder at Datakin. We focus on the challenges, accessible instruments, and future for giant knowledge and knowledge lineage.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Transcript

Transcript offered by We Edit Podcasts. Software program Engineering Each day listeners can go to weeditpodcasts.com to get 15% off the primary three months of audio enhancing and transcription companies with code: SED. Due to We Edit Podcasts for partnering with SE Each day. Please click on right here to view this present’s transcript.

Sponsors

Triplebyte is a community of 200,000+ High Engineers. Triplebyte works with greater than 400 tech corporations together with Coinbase, Zoox, Snap, Gusto, and Fb. Triplebyte places engineers accountable for their job search, and helps engineers discover the position that’s proper for them. Triplebyte provides suggestions on what corporations do (or don’t do) together with your software. This allows you to be taught from each software and enhance over time. Go to Triplebyte.com/sedaily to join free.

ClickUp is no-code challenge administration software program that brings your entire engineering work into one place, they usually assure to avoid wasting you in the future each week by consolidating your instruments. Engineers use ClickUp to collaborate on code, docs, sprints, bug monitoring, roadmaps, and chat. So code smarter, not tougher with ClickUp. Attempt ClickUp for Free as we speak at ClickUp.com/sedaily and use code SED to get 30% off Limitless and 15% off Enterprise plans.

Pachyderm is an easy-to-use MLOps platform that empowers anybody to construct scalable end-to-end machine studying workflows, no matter no matter language or framework they’re constructed on. Pachyderm offers Git-like knowledge versioning and lineage to robotically observe each knowledge change and last output outcome. Head over to pachyderm.com/sedaily to recover from $400 in free credit. However hurry as a result of this supply solely lasts for a restricted time.

TeamCity Cloud is a brand new steady integration service that’s utterly hosted and managed by JetBrains. It’s based mostly on the unique on-premises model of TeamCity, and shares most of its performance. Multiplatform improvement, integration with well-liked construct and take a look at frameworks, real-time suggestions, take a look at historical past and take a look at evaluation – these are only a few of the various highly effective options that may take your staff to a brand new stage of productiveness. You possibly can attempt TeamCity Cloud freed from cost for 14 days. The trial interval provides you 12,000 construct credit (equal of 20 construct hours on the Linux Small construct agent), limitless parallel builds, 120 GB of storage, and as much as 3 self-hosted construct brokers. Get began with cloud CI/CD as we speak!

Dynatrace’s software program clever platform delivers automated and clever observability to simplify cloud complexity and speed up digital transformation. Dynatrace Cloud Automation helps enhance collaboration between dev and manufacturing groups round a single supply of reality. It accelerates supply pipelines with the automated orchestration of CI/CD and remediation workflows. And it additionally ensures risk-free releases by evaluating high quality gates and repair stage targets earlier within the lifecycle. See why Dynatrace is radically completely different and check out it free for 15 days at Dynatrace.com/SE-Each day

[ad_2]

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments