A collection of ELI5s

Jul 31, 2024

I recently went to IIT Kharagpur to hire interns for Rippling. In one of my interviews, an inquisitive student asked me what I did at work. At that time, I was building a CDC pipeline between two databases.

I used the following metaphor to explain CDC to this particularly twinkle-eyed student:

Suppose you and I both have a sheet of paper. My job is to copy everything you write on your sheet of paper to mine. You can write on any part of your paper, you may also delete or replace content but you will never hide your paper from me. There are typically two solutions to this problem.

The first approach:

After every few minutes, I take a mental image of your paper, erase my paper and then copy the contents from my mind to my paper. This is called a snapshot sync. The main advantage of this approach is that I am occupied only when I decide to take the snapshot (mental image). One disadvantage is that irrespective of the actual changes, I will always write all the contents of the page. The other disadvantage is that my page will lag behind your page most of the time, the difference between them peaking right before I take a snapshot of your page.

The second approach:

I keep a watchful eye over your sheet and copy anything you do to my sheet immediately. This is called change data capture or CDC. The main advantage of this approach is that it minimizes the time my page is behind your page, increasing consistency between the two. The main disadvantage is that I cannot rest for a single moment. The other disadvantage of this approach is that this requires strict ordering of copying changes. If I fail to write any change, all future changes will be kept on hold.

Now imagine that instead of a sheet of paper, you and I have a whole book. What about all the books in your college library? At this point, the complexity of the task clicked him and he left intrigued.

Satisfied with my endeavour, I realised that I have a knack of creating such metaphors. This blog is a dump of them.

Vertical vs horizontal scaling

Prior to starting Cay Network, I worked as a Technical Consultant at Onnivation. My job was to aid their fantastic sales team evaluate new products and discover gaps in the industry. As part of my responsibilities, I gave a talk covering a 1000ft view of a startup's cloud journey intermingled with some of our best-selling products. A part of this presentation talked about scaling servers. Here is how I explained the differences between vertical and horizontal scaling when my presentation proved inadequate.

Suppose you are hosting a pizza party for all your friends. Your dough is ready, your marinara sauce is cooked, and vegetables chopped. The bell rings and you welcome three friends bearing gifts and laughs. You start preparing the base of the pizza when another friend arrives. To accomodate them, you take a bit more dough and increase the size of the pizza. Right before you spread the marinara sauce over the base, two colleagues join the party. Having introduced your friends and colleagues, you head to the kitchen and add some more dough to make the pizza larger. This process of making the pizza larger to serve more and more guests is called vertical scaling.

You realise you are more popular than you thought (or do people just love free pizza?) and more people drive in to your party. You keep increasing the size of your pizza to accomodate the new guests when you reach the glaring problem - your pizza cannot fit in your oven. You panic for a moment, but right after have an epiphany. Instead of making your pizza larger, you could add another oven and make two pizzas at once! You could even ask every other friend to get their oven to keep up with the demand. This strategy of adding more ovens to serve more and more guests is called horizontal scaling.