Sep 14, 2022

Data Engineering: Resources

As I’ve been reading and exploring the current world of data engineering I’ve been adding links to my Raindrop.io collection, so check that out. In addition, below are some specific resources that I’d recommend.

Sep 14, 2022

Data Engineering in 2022: Storage and Access

In this article I look at where we store our analytical data, how we organise it, and how we enable access to it. I’m considering here potentially large volumes of data for access throughout an organisation. I’m not looking at data stores that are used for specific purposes (caches, low-latency analytics, graph etc).

The article is part of a series in which I explore the world of data engineering in 2022 and how it has changed from when I started my career in data warehousing 20+ years ago. Read the introduction for more context and background.

Sep 14, 2022

Stretching my Legs in the Data Engineering Ecosystem in 2022

For the past 5.5 years I’ve been head-down in the exciting area of stream processing and events, and I realised recently that the world of data and analytics that I worked in up to 2017 which was changing significantly back then (Big Data, y’all!) has evolved and, dare I say it, matured somewhat - and I’ve not necessarily kept up with it. In this series of posts you can follow along as I start to reacquaint myself with where it’s got to these days.

Sep 12, 2022

Customising the fields shown in Airtable’s Calendar .ics export

Airtable is a rather wonderful tool. It powers the program creation backend process for Kafka Summit and Current. It does, however, have a few frustrating limitations - often where it feels like a feature was built on a Friday afternoon and they didn’t get chance to finish it before knocking off to head to the pub.

Aug 31, 2022

Inside the Sausage Factory: How we Built the Program for Current 2022

If you’ve ever been to a conference, particularly as a speaker whose submitted a paper that may or may not have been accepted, you might wonder quite how conferences choose the talks that get accepted.

I had the privilege of chairing the program committee for Current and Kafka Summit this year and curating the final program for both. Here’s a glimpse behind the curtains of how we built the program for Current 2022. It was originally posted as a thread on Twitter.

Aug 31, 2022

⚡️ Writing an abstract for a lightning talk ⚡️

(src)

Lightning talks are generally 5-10 minutes. As the name implies - they are quick!

A good lightning talk is not just your breakout talk condensed into a shorter time frame. You can’t simply deliver the same material faster, or the same material at a higher level, or the same material with a few bits left out

Jul 20, 2022

How to Write a Good Tech Conference Abstract - Learn from the Mistakes of Others

Building the program for any conference is not an easy task. There will always be a speaker disappointed that their talk didn’t get in—or perhaps an audience who are disappointed that a particular talk did get in. As the chair of the program committee for Current 22 one of the things that I’ve found really useful in building out the program this time round are the comments that the program committee left against submissions as they reviewed them.

There were some common patterns I saw, and I thought it would be useful to share these here. Perhaps you’re an aspiring conference speaker looking to understand what mistakes to avoid. Maybe you’re an existing speaker whose abstracts don’t get accepted as often as you’d like. Or perhaps you’re just curious as to what goes on behind the curtains :)

Apr 7, 2022

Remote-First Developer Advocacy

I’m convinced that a developer advocate can be effective remotely. As a profession, we’ve all spent two years figuring out how to do just that. Some of it worked out great. Some of it, less so.

I made the decision during COVID to stop travelling as part of my role as a developer advocate. In this article, I talk about my experience with different areas of advocacy done remotely.

Apr 7, 2022

Hanging up my Boarding Passes and Jetlag…for now

I recently started writing an abstract for a conference later this year and realised that I’m not even sure if I want to do it. Not the conference—it’s a great one—but just the whole up on stage doing a talk thing. I can’t work out if this is just nerves from the amount of time off the stage, or something more fundamental to deal with.

Apr 6, 2022

Using GitHub Actions to build automagic Hugo previews of draft articles

This blog is written in Asciidoc, built using Hugo, and hosted on GitHub Pages. I recently wanted to share the draft of a post I was writing with someone and ended up exporting a local preview to a PDF - not a great workflow! This blog post shows you how to create an automagic hosted preview of any draft content on Hugo using GitHub Actions.

This is useful for previewing and sharing one’s own content, but also for making good use of GitHub as a collaborative platform - if someone reviews and amends your PR the post gets updated in the preview too.

Apr 5, 2022

🏃🚶 The unofficial Kafka Summit London 2022 Run/Walk 🏃🚶

Kafka Summit London IS BACK! After COVID spoiled everyone’s fun and fundamentally screwed everything up for the past two years, I cannot wait to be back at an in-person conference. At the last Kafka Summit in the beforetimes (San Francisco, 2019) some of us got together for a run (or walk) across the GoldenGate bridge. I can’t promise quite the same views, but I thought it would be fun to do something similar when we meet in London later this month.

Jul 29, 2021

My Favourite Tools on the Mac (Setting up a new Mac)

This is the software counterpart to my previous article in which I looked at my workstation’s hardware setup. Some of these are unique or best-of-breed, others may have been sherlocked but I stick with them anyway :)

Jul 29, 2021

Why I use Alfred App (and maybe you should too)

I’ve used Alfred for years, and it’s one of the first apps I’ll install on a fresh Mac. It’s like the Cmd-Space search integration that MacOS has, but so much more than that. Here’s a few of the really powerful features that makes it the first app I’ll reach for to install on any new Mac - and which it’ll feel like I’m trying to work with one arm tied behind my back if I don’t have :)

Apr 1, 2021

A bash script to deploy ksqlDB queries automagically

There’s a bunch of improvements in the works for how ksqlDB handles code deployments and migrations. For now though, for deploying queries there’s the option of using headless mode (which is limited to one query file and disables subsequent interactive work on the server from a CLI), manually running commands (yuck), or using the REST endpoint to deploy queries automagically. Here’s an example of doing that.

Mar 26, 2021

Loading CSV data into Confluent Cloud using the FilePulse connector

The FilePulse connector from Florian Hussonnois is a really useful connector for Kafka Connect which enables you to ingest flat files including CSV, JSON, XML, etc into Kafka. You can read more it in its overview here. Other connectors for ingested CSV data include kafka-connect-spooldir (which I wrote about previously), and kafka-connect-fs.

Here I’ll show how to use it to stream CSV data into a topic in Confluent Cloud. You can apply the same config pattern to any other secured Kafka cluster.

Mar 24, 2021

Connecting to managed ksqlDB in Confluent Cloud with REST and ksqlDB CLI

Using ksqlDB in Confluent Cloud makes things a whole bunch easier because now you just get to build apps and streaming pipelines, instead of having to run and manage a bunch of infrastructure yourself.

Once you’ve got ksqlDB provisioned on Confluent Cloud you can use the web-based editor to build and run queries. You can also connect to it using the REST API and the ksqlDB CLI tool. Here’s how.

Mar 19, 2021

Using ksqlDB to process data ingested from ActiveMQ with Kafka Connect

The ActiveMQ source connector creates a Struct holding the value of the message from ActiveMQ (as well as its key). This is as would be expected. However, you can encounter challenges in working with the data if the ActiveMQ data of interest within the payload is complex. Things like converters and schemas can get really funky, really quick.

Mar 12, 2021

Kafka Connect JDBC Sink deep-dive: Working with Primary Keys

The Kafka Connect JDBC Sink can be used to stream data from a Kafka topic to a database such as Oracle, Postgres, MySQL, DB2, etc.

It supports many permutations of configuration around how primary keys are handled. The documentation details these. This article aims to illustrate and expand on this.

Mar 11, 2021

Kafka Connect - SQLSyntaxErrorException: BLOB/TEXT column … used in key specification without a key length

I got the error SQLSyntaxErrorException: BLOB/TEXT column 'MESSAGE_KEY' used in key specification without a key length with Kafka Connect JDBC Sink connector (v10.0.2) and MySQL (8.0.23)

Mar 4, 2021

Quick profiling of data in Apache Kafka using kafkacat and visidata

ksqlDB is a fantastically powerful tool for processing and analysing streams of data in Apache Kafka. But sometimes, you just want a quick way to profile the data in a topic in Kafka. I wrote about this previously with a convoluted (but effective) set of bash commands pipelined together to perform a GROUP BY on data. Then someone introduced me to visidata, which makes it all a lot quicker!

rmoff’s random ramblings

✨ Data Engineering, Kafka, and other random geekery 🤓

Data Engineering: Resources

Data Engineering in 2022: Storage and Access

Stretching my Legs in the Data Engineering Ecosystem in 2022

Customising the fields shown in Airtable’s Calendar .ics export

Inside the Sausage Factory: How we Built the Program for Current 2022

⚡️ Writing an abstract for a lightning talk ⚡️

How to Write a Good Tech Conference Abstract - Learn from the Mistakes of Others

Remote-First Developer Advocacy

Hanging up my Boarding Passes and Jetlag…for now

Using GitHub Actions to build automagic Hugo previews of draft articles

🏃🚶 The unofficial Kafka Summit London 2022 Run/Walk 🏃🚶

My Favourite Tools on the Mac (Setting up a new Mac)

Why I use Alfred App (and maybe you should too)

A bash script to deploy ksqlDB queries automagically

Loading CSV data into Confluent Cloud using the FilePulse connector

Connecting to managed ksqlDB in Confluent Cloud with REST and ksqlDB CLI

Using ksqlDB to process data ingested from ActiveMQ with Kafka Connect

Kafka Connect JDBC Sink deep-dive: Working with Primary Keys

Kafka Connect - SQLSyntaxErrorException: BLOB/TEXT column … used in key specification without a key length

Quick profiling of data in Apache Kafka using kafkacat and visidata