tutorial-2

Simple Kafka Enablement using StreamSets Data Collector

Creating custom Kafka producers and consumers is often a tedious process that requires manual coding. In this tutorial we'll see how to use the StreamSets Data Collector to create data ingest pipelines to write to a Kafka Producer, and read from a Kafka Consumer with no handwritten code.

Goals

The goal of this tutorial is read AVRO files from a file system directory and write them to a Kafka topic using the StreamSets Kafka Producer; we'll then use a second pipeline configured with a Kafka Consumer to drain out of that topic, perform a set of transformations and send the data to two different destinations.

Pre-requisites

A working instance of StreamSets Data Collector
A working Kafka 0.9 (or older) instance
A copy of this tutorials directory containing the sample data, pipeline 1 and, pipeline 2

Our Setup

The tutorial's sample data directory contains a set of compressed avro files that contain simulated credit card transactions in JSON format.

{
  "transaction_date":"dd/mm/YYYY",
  "card_number":"0000-0000-0000-0000",
  "card_expiry_date":"mm/YYYY",
  "card_security_code":"0000",
  "purchase_amount":"$00.00",
  "description":"transaction description of the purchase"
}

We will read avro files from our source directory, convert them into 'SDC Record' format within the data collector and finally write them back out in avro format to S3.

Let's Get Started

Part 1 - Publishing to a Kafka Producer
Part 2 - Reading from a Kafka Consumer

Name		Name	Last commit message	Last commit date
parent directory ..
img		img
pipelines		pipelines
directory_to_kafkaproducer.md		directory_to_kafkaproducer.md
kafkaconsumer_to_multipledestinations.md		kafkaconsumer_to_multipledestinations.md
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

Simple Kafka Enablement using StreamSets Data Collector

Goals

Pre-requisites

Our Setup

Let's Get Started

FilesExpand file tree

tutorial-2

Directory actions

More options

Directory actions

More options

Latest commit

History

tutorial-2

Folders and files

parent directory

readme.md

Simple Kafka Enablement using StreamSets Data Collector

Goals

Pre-requisites

Our Setup

Let's Get Started