One Kinesis Data Stream is made up of multiple shards, where the number of shards provisioned determines the billing. Producers create records of data and stream them to Kinesis. A record is a data blob: it is serialized as bytes up to 1MiB in size and can represent any kind of data. A record also contains a record key, which is used to group it into a specific shard. After being ingested into the stream, Kinesis adds a unique identifier for each record. The number of shards is unlimited. For each shard, all the records that are streamed to it are ordered.
Producers:
Each shard ingests up to 1 MiB/second and 1000 records/second, otherwise a ProvisionedThroughputException
will be thrown.
Consumer:
Maximum total data read is 2MiB/second per shard.
5 API calls/second per shard.
Data retention:
By default 24 hours, extendable to 7 days.
There are several ways to send data to a stream. AWS provides SDKs for multiple different languages, each providing APIs for Kinesis Streams. There exist useful utilities that AWS has created for sending data to streams, such as Amazon Kinesis Agent and Amazon Kinesis Producer Library (KPL).
The KPL allows for higher write throughput to Kinesis streams. It’s a configurable library that’s installed on a host that sends data to streams. It collects records and writes them to multiple shards per request. It also has a configurable retry mechanism. In order to improve throughput, it can aggregate records to increase payload size. To monitor the performance of a producer, it can send metrics to Amazon CloudWatch.
In Kinesis Data Streams, a record is a data structure containing a data blob. The API has two operations for adding records to a stream: PutRecords
for sending multiple records in a single HTTP request and PutRecord
for single records.
Because the APIs are exposed inside all AWS SDKs, available in many different programming languages, this method is the most flexible one. If one is unable to use the KPL or Kinesis Agent, for instance when sending data from a mobile application, using the API is the best choice. Using the API can also lower end-to-end latency.
A consumer is an application that reads and processes the data streamed to Kinesis. There are multiple ways of building a consumer, using Kinesis Analytics, Lambda, the Kinesis API, or the Kinesis Client Library (KCL).
The KCL can be used to create consumer applications for Kinesis Streams. It’s a Java library; support for other languages is provided via a multi-language interface. KCL takes care of load balancing across multiple instances, handling instance failures, reacting to resharding, and freeing the developer to focus on processing data from the stream.
Another way to process data streamed to Kinesis is using AWS Lambda, it runs code without managing or provisioning any servers. Lambda functions can subscribe to, read, and process batches of records that are sent to Kinesis Streams. Lambda polls the stream once per second, and when new records are found it invokes the Lambda function and passes the records as a parameter.
The downside of using Lambda is its statelessness, meaning that you can't make use of previous records easily when processing new records. Using KCL you can deploy the application to an EC2 server, where you have more control over things such as data persistence and state management.
Kinesis provides no access to fine-tuning of the service, as you don’t have access to the underlying OS, which is not the case when using a service such as Kafka. Therefore you can gain more performance when using Kafka instead of Kinesis. However, turning a Kafka solution into a production-ready environment is not easy for beginners.
Kinesis is a fully managed service. It is scalable and can handle a large amount of streaming data from hundreds of thousands of sources, with low latencies. A significant benefit of using Kinesis is that AWS takes care of the management of the service, including provisioning, cluster management, and failover. Kinesis reduces infrastructure issues, operational costs, human and machine costs.
Learn about Microsoft Azure Service Bus, a messaging solution sending data between decoupled systems, including apps hosted in public and private clouds.
Compare realtime libraries Socket.IO and SockJS on performance, scalability, developer experience, and features.
Confused about Lightstreamer vs WebSockets? Learn how they differ, when to use each, and why Lightstreamer isn’t a direct WebSocket competitor.