Kafka Connect and Schemas
Here’s a fun one that Kafka Connect can sometimes throw out:
java.lang.ClassCastException:
java.util.HashMap cannot be cast to org.apache.kafka.connect.data.Struct
HashMap? Struct? HUH?
Here’s a fun one that Kafka Connect can sometimes throw out:
java.lang.ClassCastException:
java.util.HashMap cannot be cast to org.apache.kafka.connect.data.Struct
HashMap? Struct? HUH?
<x>
has been compiled by a more recent version of the Java RuntimeThis article is just for Googlers and my future self encountering this error. Recently I was building a Docker image from the ksqlDB code base, and whilst it built successfully the ksqlDB server process in the Docker container when instantiated failed with a UnsupportedClassVersionError
:
A short & sweet blog post to help people Googling for this error, and me next time I encounter it.
The scenario: trying to create a connector in Kafka Connect (running in distributed mode, one worker) failed with the curl
response
HTTP/1.1 500 Internal Server Error
Date: Fri, 29 Nov 2019 14:33:53 GMT
Content-Type: application/json
Content-Length: 48
Server: Jetty(9.4.18.v20190429)
{"error_code":500,"message":"Request timed out"}
I was doing some troubleshooting between two services recently and wanting to poke around to see what was happening in the REST calls between them. Normally I’d reach for tcpdump
to do this but imagine my horror when I saw:
root@ksqldb-server:/# tcpdump
bash: tcpdump: command not found
I talk and write about Kafka and Confluent Platform a lot, and more and more of the demos that I’m building are around Confluent Cloud. This means that I don’t have to run or manage my own Kafka brokers, Zookeeper, Schema Registry, KSQL servers, etc which makes things a ton easier. Whilst there are managed connectors on Confluent Cloud (S3 etc), I need to run my own Kafka Connect worker for those connectors not yet provided. An example is the MQTT source connector that I use in this demo. Up until now I’d either run this worker locally, or manually build a cloud VM. Locally is fine, as it’s all Docker, easily spun up in a single docker-compose up -d
command. I wanted something that would keep running whilst my laptop was off, but that was as close to my local build as possible—enter GCP and its functionality to run a container on a VM automagically.
You can see the full script here. The rest of this article just walks through the how and why.
I started hitting problems when trying Debezium against MySQL v8. When creating the connector:
This is based on using Confluent Cloud to provide your managed Kafka and Schema Registry. All that you run yourself is the Kafka Connect worker.
Optionally, you can use this Docker Compose to run the worker and a sample MySQL database.
The Kafka Connect framework provides generic error handling and dead-letter queue capabilities which are available for problems with [de]serialisation and Single Message Transforms. When it comes to errors that a connector may encounter doing the actual pull
or put
of data from the source/target system, it’s down to the connector itself to implement logic around that. For example, the Elasticsearch sink connector provides configuration (behavior.on.malformed.documents
) that can be set so that a single bad record won’t halt the pipeline. Others, such as the JDBC Sink connector, don’t provide this yet. That means that if you hit this problem, you need to manually unblock it yourself. One way is to manually move the offset of the consumer on past the bad message.
TL;DR : You can use kafka-consumer-groups --reset-offsets --to-offset <x>
to manually move the connector past a bad message
I use the Elastic stack for a lot of my talks and demos because it complements Kafka brilliantly. A few things have changed in recent releases and this blog is a quick note on some of the errors that you might hit and how to resolve them. It was inspired by a lot of the comments and discussion here and here.
I’ve written before about kafkacat and what a great tool it is for doing lots of useful things as a developer with Kafka. I used it too in a recent demo that I built in which data needed manipulating in a way that I couldn’t easily elsewhere. Today I want share a very simple but powerful use for kafkacat as both a consumer and producer: copying data from one Kafka cluster to another. In this instance it’s getting data from Confluent Cloud down to a local cluster.