This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
…
continue reading
Little Fluffy PolyClouds: The Data Engineering Playbook is your essential guide to building cloud-agnostic data infrastructure. We provide practical, step-by-step strategies for designing and deploying resilient data systems across all major platforms, including AWS, Azure, and GCP.
…
continue reading
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io ...
…
continue reading
The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting. SEASON 1 DATA BROS Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading hig ...
…
continue reading
Discussions around Data Engineering
…
continue reading
Databases and data engineering episodes of Software Engineering Daily
…
continue reading
Unlocking the Power of Data: A Guide for Leaders and Executives" As a leader or executive, you know the importance of data in driving business decisions and staying ahead of the competition. But, with the increasing amount of data generated daily, it can be overwhelming to know where to start and how to utilize this valuable asset effectively. This blog, with multiple topics, addresses the technical terminology in data engineering and analytics on the cloud.
…
continue reading

1
Episode 1: Mastering the AWS Data Assembly Line
18:05
18:05
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
18:05This is the essential guide to Domain 1: Data Ingestion and Transformation—the biggest section (34%) of the AWS Certified Data Engineer - Associate (DEA-C01) exam! We break down the core components of a successful data pipeline. Learn to compare Batch vs. Streaming with services like Kinesis and DMS, master ETL/ELT using AWS Glue and EMR, and orche…
…
continue reading

1
Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo
29:36
29:36
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
29:36The shift from monolithic to decentralized data workflows changes how teams build, connect and scale pipelines. In this episode, we feature Oscar Ligthart, Lead Data Engineer, and Rodrigo Loredo, Lead Analytics Engineer, both at Vinted, as we unpack their YAML-driven abstraction that generates Airflow DAGs and standardizes cross-team orchestration.…
…
continue reading

1
The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies
1:04:16
1:04:16
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
1:04:16Summary In this episode Kate Shaw, Senior Product Manager for Data and SLIM at SnapLogic, talks about the hidden and compounding costs of maintaining legacy systems—and practical strategies for modernization. She unpacks how “legacy” is less about age and more about when a system becomes a risk: blocking innovation, consuming excess IT time, and cr…
…
continue reading

1
Block Bad Data Before the Write with Nike’s Ashok Singamaneni
20:20
20:20
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
20:20โดย The Firebolt Data Bros
…
continue reading

1
Transforming Data Pipelines at XENA Intelligence with Naseem Shah
28:32
28:32
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
28:32The shift from simple cron jobs to orchestrated AI-powered workflows is reshaping how startups scale. For a small team, these transitions come with unique challenges and big opportunities. In this episode, Naseem Shah, Head of Engineering at Xena Intelligence, shares how he built data pipelines from scratch, adopted Apache Airflow and transformed A…
…
continue reading

1
Context Engineering as a Discipline: Building Governed AI Analytics
51:58
51:58
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
51:58Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his journey from initial skepticism to embracing agentic AI as model and a…
…
continue reading

1
Scaling Geospatial Workflows With Airflow at Overture Maps Foundation and Wherobots with Alex Iannicelli and Daniel Smith
24:03
24:03
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
24:03Using Airflow to orchestrate geospatial data pipelines unlocks powerful efficiencies for data teams. The combination of scalable processing and visual observability streamlines workflows, reduces costs and improves iteration speed. In this episode, Alex Iannicelli, Staff Software Engineer at Overture Maps Foundation, and Daniel Smith, Senior Soluti…
…
continue reading

1
The Data Model That Captures Your Business: Metric Trees Explained
1:01:05
1:01:05
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
1:01:05Summary In this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data practices at Rent the Runway and explains how the modern data stack has…
…
continue reading

1
Scaling Airflow for Enterprise Data Platforms at PepsiCo with Kunal Bhattacharya
19:04
19:04
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
19:04PepsiCo’s data platform drives insights across finance, marketing and data science. Delivering stability, scalability and developer delight is central to its success, and engineering leadership plays a key role in making this possible. In this episode, Kunal Bhattacharya, Senior Manager of Data Platform Engineering at PepsiCo, shares how his team m…
…
continue reading

1
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra
56:31
56:31
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
56:31Summary In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting…
…
continue reading

1
Building a Unified Data Platform at Pattern with William Graham
24:09
24:09
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
24:09The orchestration of data workflows at scale requires both flexibility and security. At Pattern, decoupling scheduling from orchestration has reshaped how data teams manage large-scale pipelines. In this episode, we are joined by William Graham, Senior Data Engineer at Pattern, who explains how his team leverages Apache Airflow alongside their open…
…
continue reading

1
How Astronomer Turns Proactive Monitoring Into Customer Success with Collin McNulty
25:34
25:34
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
25:34The evolution of Airflow continues to shape data orchestration and monitoring strategies. Leveraging it beyond traditional ETL use cases opens powerful new possibilities for proactive support and internal operations. In this episode, we are joined by Collin McNulty, Sr. Director of Global Support at Astronomer, who shares insights from his journey …
…
continue reading

1
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
52:58
52:58
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
52:58Summary In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Ma…
…
continue reading

1
Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal
21:38
21:38
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
21:38In this episode of The Data Engineering Show, Benjamin Wagner sits down with Ankit Mittal, former Senior Engineer at Instacart, to explore how they revolutionized their search infrastructure by transitioning from Elasticsearch to PostgreSQL. Learn how Instacart tackled the unique challenges of fast-moving grocery inventory, achieved high-performanc…
…
continue reading

1
Overcoming Data Engineering Challenges at Daiichi Sankyo Europe GmbH with Evgenii Prusov
19:26
19:26
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
19:26The shift to a unified data platform is reshaping how pharmaceutical companies manage and orchestrate data. Establishing standards across regions and teams ensures scalability and efficiency in handling large-scale analytics. In this episode, Evgenii Prusov, Senior Data Platform Engineer of Daiichi Sankyo Europe GmbH, joins us to discuss building a…
…
continue reading

1
Duck Lake: Simplifying the Lakehouse Ecosystem
1:10:41
1:10:41
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
1:10:41Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a new entrant in the open lakehouse ecosystem. They discuss how Duck Lake, is focused on simplicity, flexibility, and offers a unified catalog and table format compared to other lakehouse formats like I…
…
continue reading

1
Building a Data-Driven Beauty and Wellness Marketplace at StyleSeat with Paschal Onuorah
23:05
23:05
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
23:05StyleSeat is revolutionizing how beauty and wellness professionals grow their businesses through data-driven tools. From streamlining scheduling to optimizing marketing, their platform empowers professionals to focus on their craft while expanding their client base. In this episode, Paschal Onuorah, Senior Data Engineer at StyleSeat, shares how the…
…
continue reading

1
Aligning Business and Data: The Essential Role of Data Modeling
1:06:51
1:06:51
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
1:06:51Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as a collaborative process between business stakeholders and data teams. He debunks common misconceptions that dat…
…
continue reading

1
Is Self-Service BI a False Promise? Lei Tang of Fabi.ai Thinks So
21:07
21:07
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
21:07Explore the future of AI-powered business intelligence with Lei Tang, CTO and Co-founder of Fabi.ai, as he discusses the evolution from traditional self-service BI to "Vibe-analytics." Learn how AI is transforming data accessibility, enabling anyone to perform sophisticated analytics without deep technical expertise. From building trust in AI-gener…
…
continue reading

1
Building the Future of Airflow Execution at Astronomer with Ian Buss and Piotr Chomiak
22:25
22:25
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
22:25The evolution of orchestration in Airflow continues with innovations that address both scalability and security. From improving executor reliability to enabling remote execution, these advancements reshape how organizations manage data pipelines. In this episode, we’re joined by Ian Buss, Principal Software Engineer at Astronomer, and Piotr Chomiak…
…
continue reading

1
From Academia to Industry: Bridging Data Engineering Challenges
50:54
50:54
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
50:54Summary In this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and lineage, as well as the challenges of data integration. He explores t…
…
continue reading

1
Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille
24:17
24:17
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
24:17Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable. In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access…
…
continue reading

1
High Performance And Low Overhead Graphs With KuzuDB
1:01:29
1:01:29
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
1:01:29Summary In this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. He discusses the usability and scalability of KuzuDB, emphasizing its…
…
continue reading

1
How Moniepoint Group Uses Airflow for Exposure Monitoring with Adeolu Adegboye
21:32
21:32
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
21:32Managing financial data at scale requires precise orchestration and proactive monitoring to maintain operational efficiency. In this episode, we are joined by Adeolu Adegboye, Data Engineer at Moniepoint Group, who shares how his team uses data pipelines and workflow automation to manage high volumes of transactions, ensure timely alerts and suppor…
…
continue reading

1
Bridging Data and Decision-Making: AI's Role in Modern Analytics
1:10:44
1:10:44
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
1:10:44Summary In this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomous data analyst that bridges the gap between data availability and business decision-making. Lucas and Drew share their backgrounds in data analytics and how their experiences have shaped their approa…
…
continue reading

1
Inside Bosch’s Airflow 3 Revolution: Remote Execution with Jens Scheffler
28:02
28:02
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
28:02The evolution of Airflow has reached a milestone with the introduction of remote execution in Airflow 3, enabling flexible orchestration across distributed environments. In this episode, Jens Scheffler, Test Execution Cluster Technical Architect at Bosch, shares insights on how his team’s need for large-scale, cross-environment testing influenced t…
…
continue reading

1
From Bits to Tables: The Evolution of S3 Storage
50:08
50:08
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
50:08Summary In this episode of the Data Engineering Podcast Andy Warfield talks about the innovative functionalities of S3 Tables and Vectors and their integration into modern data stacks. Andy shares his journey through the tech industry and his role at Amazon, where he collaborates to enhance storage capabilities, discussing the evolution of S3 from …
…
continue reading

1
Inside Modern Data Infrastructure at Massdriver with Cory O’Daniel and Jake Ferriero
31:24
31:24
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
31:24Managing modern data platforms means navigating a web of complex infrastructure, competing team needs and evolving security standards. For data teams to truly thrive, infrastructure must become both accessible and compliant without sacrificing velocity or reliability. In this episode, we’re joined by Cory O’Daniel, CEO and Co-Founder at Massdriver,…
…
continue reading

1
Revolutionizing Python Notebooks with Marimo
51:56
51:56
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
51:56Summary In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. He discusses the challenges of traditional Jupyter notebooks, such as…
…
continue reading

1
Building Uber's AI Assistant: How Genie Revolutionizes On-Call Support with Paarth Chothani from Uber
25:31
25:31
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
25:31Journey inside Uber's innovative AI assistant "Genie" with Paarth Chotani, Staff Engineer at Uber, as he shares how they're revolutionizing on-call support using LLMs and vector search. From processing massive amounts of internal documentation to building scalable RAG pipelines, discover how Uber tackles the challenges of implementing AI assistants…
…
continue reading

1
Warehouse Native Incremental Data Processing With Dynamic Tables And Delayed View Semantics
55:07
55:07
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
55:07Summary In this episode of the Data Engineering Podcast Dan Sotolongo from Snowflake talks about the complexities of incremental data processing in warehouse environments. Dan discusses the challenges of handling continuously evolving datasets and the importance of incremental data processing for optimized resource use and reduced latency. He expla…
…
continue reading

1
The Future of Airflow Telemetry with Bolke de Bruin
21:55
21:55
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
21:55Telemetry has the potential to guide the future of Airflow, but only if it’s implemented transparently and with community trust. In this episode, we’re joined by Bolke de Bruin, Director at Metyis and a long-time Airflow PMC member. Bolke discusses how telemetry has been handled in the past, why it matters now and what it will take to get it right.…
…
continue reading

1
Streamlining Data Pipelines with MCP Servers and Vector Engines
52:04
52:04
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
52:04Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process unstructured data. Kacper shares his experience in data engineering, from building big data pipelines in the automotive industry to leveraging large language models (LLMs) for transforming unstructured d…
…
continue reading

1
Transforming the Airflow UI for Cloudera’s Users with Shubham Raj
22:28
22:28
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
22:28Contributing to open-source projects can be daunting, but it can also unlock unexpected innovation. This episode showcases how one engineer’s journey with Apache Airflow led to impactful UI enhancements and infrastructure solutions at scale. Shubham Raj, Software Engineer II at Cloudera, shares how his team built a drag-and-drop DAG editor for non-…
…
continue reading

1
Streamlining Thousands of Data Pipelines at Lyft with Yunhao Qing
19:34
19:34
เล่นในภายหลัง
เล่นในภายหลัง
ลิสต์
ถูกใจ
ที่ถูกใจแล้ว
19:34Managing data pipelines at scale is not just a technical challenge. It is also an organizational one. At Lyft, success means empowering dozens of teams to build with autonomy while enforcing governance and best practices across thousands of workflows. In this episode, we speak with Yunhao Qing, Software Engineer at Lyft, about building a governed d…
…
continue reading