Artwork

เนื้อหาจัดทำโดย Jeremy Daly and Rebecca Marshburn เนื้อหาพอดแคสต์ทั้งหมด รวมถึงตอน กราฟิก และคำอธิบายพอดแคสต์ได้รับการอัปโหลดและจัดเตรียมโดย Jeremy Daly and Rebecca Marshburn หรือพันธมิตรแพลตฟอร์มพอดแคสต์โดยตรง หากคุณเชื่อว่ามีบุคคลอื่นใช้งานที่มีลิขสิทธิ์ของคุณโดยไม่ได้รับอนุญาต คุณสามารถปฏิบัติตามขั้นตอนที่อธิบายไว้ที่นี่ https://th.player.fm/legal
Player FM - แอป Podcast
ออฟไลน์ด้วยแอป Player FM !

Episode #92: Streaming Data at Scale Using Serverless with Anahit Pogosova (PART 2)

42:10
 
แบ่งปัน
 

Manage episode 287372848 series 2516108
เนื้อหาจัดทำโดย Jeremy Daly and Rebecca Marshburn เนื้อหาพอดแคสต์ทั้งหมด รวมถึงตอน กราฟิก และคำอธิบายพอดแคสต์ได้รับการอัปโหลดและจัดเตรียมโดย Jeremy Daly and Rebecca Marshburn หรือพันธมิตรแพลตฟอร์มพอดแคสต์โดยตรง หากคุณเชื่อว่ามีบุคคลอื่นใช้งานที่มีลิขสิทธิ์ของคุณโดยไม่ได้รับอนุญาต คุณสามารถปฏิบัติตามขั้นตอนที่อธิบายไว้ที่นี่ https://th.player.fm/legal

About Anahit Pogosova

Anahit is an AWS Community Builder and a Lead Cloud Software Engineer at Solita, one of Finland’s largest digital transformation services companies. She has been working on full-stack and data solutions for more than a decade. Since getting into the world of serverless she has been generously sharing her expertise with the community through public speaking and blogging.

Watch this episode on YouTube: https://youtu.be/7pmJJcm0sAU

This episode sponsored by New Relic and Stackery.

Transcript

Jeremy: So you mentioned poll-based versus stream and things like that. So when you connect Kinesis to Lambda, this is the other thing too, I think that confuses people sometimes. You're not actually connecting it to Lambda directly for pretty much all of these triggers in these integrations. There's another service that is in between there. So what's the difference between the Lambda service and the Lambda function itself?

Anahit: That's a great one because I think it's, again, one of those very confusing topics, which are not explained too well in the documentation. And the thing is that when you're just starting dipping your toes in the Lambda world, you just think that, "Okay, I write my code, and I upload it and deploy it, and everything just works. And this is my Lambda," right? But you don't really know how much of the extra magic is happening behind the scenes, and how many components are actually involved into making it a seamless service. And there is a lot of components that come into ... so you can think of a Lambda function as the function that we actually write and deploy and invoke. But then the Lambda service is what does all the triggering, invoking and batching and error handling.

And it really depends on the way the Lambda works, or the way long the service works. It really depends on the invocation model, is you prefer to the poll based, not poll based. So again, one thing that is not too clearly explained, in my opinion, is that there is actually three different ways you can work with Lambda or communicate with Lambda. So you can invoke a Lambda synchronously. So request response traditional way, and the best example, I think, is API gateway, which does that so it requests something from Lambda, it waits for the response. Then there is the async way, which is one of the most common. So you just send something to Lambda and you don't care about what happens next.

Jeremy: Which uses an SQSQ behind the scenes to queue ...

Anahit: Exactly. Yes. That's also like fun facts that you learn along the way. But the point is that like services like SNS, for example, or S3 notifications, they all use the async model, because they don't care about what happens with the identification. They just invoke Lambda and that's it. But then there is this third, gray area or a third totally different way of invoking the Lambda function, and it's called poll-based. And that's exactly how Kinesis operates with Lambda. And it's meant for streaming event sources, so it's both Kinesis data, DynamoDB streams. Also, Kafka currently uses poll-based model. And it also works with the queue of event sources like SQS.

Jeremy: Right. SQS, yeah.

Anahit: And Amazon MQ, I think they also use them, the poll-based method. And what poll-based invocation or the component that is most essential in the poll-based model, it's called the event source mapping. One of the misunderstood components or one of the hidden heroes, I would say, we find in Lambda, because it's an essential service or essential part of the Lambda service. And event source mapping actually takes care of all that extra things that Kinesis plus Lambda combination is capable of. So it's responsible for batching, it's responsible for keeping track of this point in the stream and where a shard, where it's ...

Jeremy: A shard iterator, because anybody wants to know the ...

Anahit: Yes, exactly, shard iterator.

Jeremy: ... technical term for it.

Anahit: Yes, thank you. And, yeah, the most important for me, it handles the errors and retries behind the scenes.

Jeremy: Right.

Anahit: And basically, if you don't have event source mapping, you can't have batching. So it takes care of accumulating, or in case of standard, consistent consumer, it pulls your Kinesis stream, on your behalf, it accumulates batches of records, and then it invokes your Lambda function with that batches of records that it accumulated. Again, in case of enhanced fan-out, of course, it doesn't poll, it gets the records from the Kinesis stream directly. But then from the perspective of your Lambda function doesn't matter, it just gets triggered by the event source mapping, because as you've said yourself, it's not the Lambda that you connect to Kinesis stream, it's the event source mapping that you connect to the stream, and then you point your Lambda to that event source mapping, so.

Jeremy: Right. So you can connect a Lambda function or the Lambda service directly to the Kinesis stream itself, or you can use enhanced fan-out and push it to the Lambda function. Although, for all intents and purposes, it's pretty much the same thing.

Anahit: Yeah. And for your Lambda function, it doesn't really matter how that data ended, or how those records ended up there, you just get a batch of records, and then you deal with it. And I mean, all the rest is pretty much the same from the perspective of a Lambda function, because it's nicely abstracted behind the event source mapping, which hides all that magic that happens behind the scenes.

Jeremy: Right. So you mentioned some aggregations stuff in there and about like Windows and time windows and things like that. So tumbling windows, that's something you can do in Kinesis, as well. Can you explain that?

Anahit: Yeah, it's a feature that actually came out very, very recently. In the end of the re:Invent, I would even say, and I think it was like one day before I was going to publish my second part of my blog post that was already finally ready to submit it, and then in the evening I get this and I was like, "Okay, I have to write a whole new chapter now." But it is a very interesting aspect, you can use it with both Kinesis and DynamoDB streams, actually, so it's available for both. And it's a totally different way of using streams, which wasn't there before. So with Lambda function you know that you can retain state between your function executions unless you are using some external data source or database.

And here, what you're allowed to do with this tumbling window is that you can persist the state of yo...

  continue reading

142 ตอน

Artwork
iconแบ่งปัน
 
Manage episode 287372848 series 2516108
เนื้อหาจัดทำโดย Jeremy Daly and Rebecca Marshburn เนื้อหาพอดแคสต์ทั้งหมด รวมถึงตอน กราฟิก และคำอธิบายพอดแคสต์ได้รับการอัปโหลดและจัดเตรียมโดย Jeremy Daly and Rebecca Marshburn หรือพันธมิตรแพลตฟอร์มพอดแคสต์โดยตรง หากคุณเชื่อว่ามีบุคคลอื่นใช้งานที่มีลิขสิทธิ์ของคุณโดยไม่ได้รับอนุญาต คุณสามารถปฏิบัติตามขั้นตอนที่อธิบายไว้ที่นี่ https://th.player.fm/legal

About Anahit Pogosova

Anahit is an AWS Community Builder and a Lead Cloud Software Engineer at Solita, one of Finland’s largest digital transformation services companies. She has been working on full-stack and data solutions for more than a decade. Since getting into the world of serverless she has been generously sharing her expertise with the community through public speaking and blogging.

Watch this episode on YouTube: https://youtu.be/7pmJJcm0sAU

This episode sponsored by New Relic and Stackery.

Transcript

Jeremy: So you mentioned poll-based versus stream and things like that. So when you connect Kinesis to Lambda, this is the other thing too, I think that confuses people sometimes. You're not actually connecting it to Lambda directly for pretty much all of these triggers in these integrations. There's another service that is in between there. So what's the difference between the Lambda service and the Lambda function itself?

Anahit: That's a great one because I think it's, again, one of those very confusing topics, which are not explained too well in the documentation. And the thing is that when you're just starting dipping your toes in the Lambda world, you just think that, "Okay, I write my code, and I upload it and deploy it, and everything just works. And this is my Lambda," right? But you don't really know how much of the extra magic is happening behind the scenes, and how many components are actually involved into making it a seamless service. And there is a lot of components that come into ... so you can think of a Lambda function as the function that we actually write and deploy and invoke. But then the Lambda service is what does all the triggering, invoking and batching and error handling.

And it really depends on the way the Lambda works, or the way long the service works. It really depends on the invocation model, is you prefer to the poll based, not poll based. So again, one thing that is not too clearly explained, in my opinion, is that there is actually three different ways you can work with Lambda or communicate with Lambda. So you can invoke a Lambda synchronously. So request response traditional way, and the best example, I think, is API gateway, which does that so it requests something from Lambda, it waits for the response. Then there is the async way, which is one of the most common. So you just send something to Lambda and you don't care about what happens next.

Jeremy: Which uses an SQSQ behind the scenes to queue ...

Anahit: Exactly. Yes. That's also like fun facts that you learn along the way. But the point is that like services like SNS, for example, or S3 notifications, they all use the async model, because they don't care about what happens with the identification. They just invoke Lambda and that's it. But then there is this third, gray area or a third totally different way of invoking the Lambda function, and it's called poll-based. And that's exactly how Kinesis operates with Lambda. And it's meant for streaming event sources, so it's both Kinesis data, DynamoDB streams. Also, Kafka currently uses poll-based model. And it also works with the queue of event sources like SQS.

Jeremy: Right. SQS, yeah.

Anahit: And Amazon MQ, I think they also use them, the poll-based method. And what poll-based invocation or the component that is most essential in the poll-based model, it's called the event source mapping. One of the misunderstood components or one of the hidden heroes, I would say, we find in Lambda, because it's an essential service or essential part of the Lambda service. And event source mapping actually takes care of all that extra things that Kinesis plus Lambda combination is capable of. So it's responsible for batching, it's responsible for keeping track of this point in the stream and where a shard, where it's ...

Jeremy: A shard iterator, because anybody wants to know the ...

Anahit: Yes, exactly, shard iterator.

Jeremy: ... technical term for it.

Anahit: Yes, thank you. And, yeah, the most important for me, it handles the errors and retries behind the scenes.

Jeremy: Right.

Anahit: And basically, if you don't have event source mapping, you can't have batching. So it takes care of accumulating, or in case of standard, consistent consumer, it pulls your Kinesis stream, on your behalf, it accumulates batches of records, and then it invokes your Lambda function with that batches of records that it accumulated. Again, in case of enhanced fan-out, of course, it doesn't poll, it gets the records from the Kinesis stream directly. But then from the perspective of your Lambda function doesn't matter, it just gets triggered by the event source mapping, because as you've said yourself, it's not the Lambda that you connect to Kinesis stream, it's the event source mapping that you connect to the stream, and then you point your Lambda to that event source mapping, so.

Jeremy: Right. So you can connect a Lambda function or the Lambda service directly to the Kinesis stream itself, or you can use enhanced fan-out and push it to the Lambda function. Although, for all intents and purposes, it's pretty much the same thing.

Anahit: Yeah. And for your Lambda function, it doesn't really matter how that data ended, or how those records ended up there, you just get a batch of records, and then you deal with it. And I mean, all the rest is pretty much the same from the perspective of a Lambda function, because it's nicely abstracted behind the event source mapping, which hides all that magic that happens behind the scenes.

Jeremy: Right. So you mentioned some aggregations stuff in there and about like Windows and time windows and things like that. So tumbling windows, that's something you can do in Kinesis, as well. Can you explain that?

Anahit: Yeah, it's a feature that actually came out very, very recently. In the end of the re:Invent, I would even say, and I think it was like one day before I was going to publish my second part of my blog post that was already finally ready to submit it, and then in the evening I get this and I was like, "Okay, I have to write a whole new chapter now." But it is a very interesting aspect, you can use it with both Kinesis and DynamoDB streams, actually, so it's available for both. And it's a totally different way of using streams, which wasn't there before. So with Lambda function you know that you can retain state between your function executions unless you are using some external data source or database.

And here, what you're allowed to do with this tumbling window is that you can persist the state of yo...

  continue reading

142 ตอน

ทุกตอน

×
 
Loading …

ขอต้อนรับสู่ Player FM!

Player FM กำลังหาเว็บ

 

คู่มืออ้างอิงด่วน