The hidden complexity of "just a few lines of code"

We're often hosting meetups in the Algolia Paris office. To make our lives easier, I decided to build an API that would allow us to fetch metadata about the event easily (like description, date, list of attendees, etc). I decided to host that on Cloudflare Workers.

The premise was simple. Pass the url of the event to the API, and metadata about the event returned in a nicely formatted JSON. My function would crawl the URL, parse it with cheerio, extract relevant information and return it. Easy.

It's just gonna be a few lines of code

I could grab the description easily, write a bit more complex selectors and get the date. Great.

Then I realized that the meetup.com markup had some specific JSON blocks, called JSON-LD, that had most of the data I needed. Ok, so let's understand the schema, and extract what I need from that. Some keys are easier to access from this JSON (like the startDate and endDate) while others are still better in the HTML (description is truncated in the JSON for example). No problem, let's build an hybrid approach.

{
  "@context": "https://schema.org",
  "@type": "Event",
  "name": "HumanTalks Paris Novembre 2024",
  "url": "https://www.meetup.com/humantalks-paris/events/304412456/",
  "description": "Hello everyone!\nThis month we wanted to thank **Algolia** for hosting us.",
  "startDate": "2024-11-12T18:45:00+01:00",
  "endDate": "2024-11-12T21:30:00+01:00",
  "eventStatus": "https://schema.org/EventScheduled",
  "image": [
        "https://secure-content.meetupstatic.com/images/classic-events/524553157/676x676.jpg",
        "https://secure-content.meetupstatic.com/images/classic-events/524553157/676x507.jpg",
        "https://secure-content.meetupstatic.com/images/classic-events/524553157/676x380.jpg"
  ],
  "eventAttendanceMode": "https://schema.org/OfflineEventAttendanceMode",
  "location": {
        "@type": "Place",
        "name": "Algolia",
        "address": {
          "@type": "PostalAddress",
          "addressLocality": "Paris",
          "addressRegion": "",
          "addressCountry": "fr",
          "streetAddress": "5 rue de Bucarest, Paris"
        },
        "geo": {
          "@type": "GeoCoordinates",
          "latitude": 48.880684,
          "longitude": 2.326596
        }
  },
  "organizer": {
        "@type": "Organization",
        "name": "HumanTalks Paris",
        "url": "https://www.meetup.com/humantalks-paris/"
  }
}

Ok, now for the list of attendees. Oh, it's not in the markup, it's actually fetched from the front-end dynamically and added to the page. It seems to be doing a GraphQL query and getting the list of all attendees. All I have to do is replicate that query from my Cloudflare Worker. Thankfully, both the URL and the required query ID are available in the initial page. I just need to handle pagination of the results (as they are only returned by chunks of 50).

Testing

At that point, my code was obviously a bit more than just a few lines of code. I had to extract data from three different sources: HTML source, JSON LD and a GraphQL API.

That's when I started to add some unit tests. The number of edge cases was starting to increase, and I wanted to be sure that when I fixed one, I didn't end up breaking another.

Error Handling

As I spent a few days working on that, I also realized that meetups that have passed are no longer accessible (unless you're logged in). That means I also needed to add some error handling for those cases. Also, some meetups can disable the fetching of the attendee list, even if the meetup itself is public. I also needed to handle that case gracefully.

More sources

The project worked well, and really helped us internally plan for meetups. But then we had to host a meetup hosted on Eventbrite. And one on Luma. So I had to go back to my code again and see how to make my code work with those other platforms.

I had to re-organize my code, to have some shared helpers (like for parsing HTML with cheerio or for extracting JSON LD), but still keep per-source unit test and logic.

I also had to handle some source-specific issues. For example Eventbrite has no visible attendee list, and the HTML returned by Luma is different based on the User-Agent passed. Many things are the same, but many others are different. That's when having unit tests really started to shine. I could be sure that I could fix something for Eventbrite, without breaking Luma. This would have been a nightmare to test manually.

What I learned

An important lesson I learned was to make endpoints that were as generic as possible. Initially I had an endpoint for getting the description, another for getting the dates, etc. But I realized it was much easier (from a code POV, as well as a user POV), to have one endpoint that returned everything.

I still kept the list of attendees behind a flag (so, off by default, but if you need them you turn it on on the request), as it was the slowest part of the request and also the most prone to fail (on Eventbrite, or on past Meetup pages for example).

This "Oh, it's just going to be a simple proxy to grab 2 or 3 pieces of data" turned into a much more complex beast, but I like it. From the outside, I managed to keep it simple (just one API endpoint), and from the inside I have enough tests and shared code that it's relatively easy to make it evolve.

My journey through the ESM Tree Shaking forest

I had to work with Cloudflare Workers recently, and everything worked well until one day one of the HTTP calls I was doing started to fail.

When I ran the same piece of code locally it worked (obviously!). But pushed and ran through Cloudflare Workers, it failed. This was the first step in what then became a day-long trip into the rabbit hole of debugging. After a couple of hours of debugging "live" (by pushing my code, hitting the server, and checking logs), I finally discovered that my issue was that the HTTP endpoint I targeted had a rate limit, based on the originating IP. And when doing calls from Cloudflare, sharing the same IP with other workers, the IP already had hit the limit and my calls would fail.

That made me start digging into better ways to locally test CFW production code, and I discovered the wrangler dev mode. At first I thought this would spin my code on a real remote server and broadcast the console.log locally to my terminal, but no, it's a minimal version of CFW that runs locally. Not exactly the same as a staging env, but pretty close.

The main difference with running my scripts locally through unit tests is that when using wrangler dev, my code is bundled with esbuild and the bundled version is executed. This opened a whole new category of problems and questions to me.

First, I realized that the bundle size was way too big for what my function was actually doing. I had ~100 lines of code at most, but my bundle was several megabytes of minified code. Surely, something wasn't right. By inspecting the bundled code I realized that it had bundled all my dependencies and subdependencies.

But, isn't it supposed to do tree shaking?

I had read that esbuild was the new hotness, and that it should do tree shaking my dependencies automatically, keeping only what I would actually use. But somehow, it didn't seem to work.

What I learned is that tree-shaking is not possible through the virtue of esbuild alone. Modules have to be in ESM as well (so, basically using import rather than require) for it to actually work. So I updated my dependencies to their latest versions; most of them are now ESM-compliant. I managed to upgrade all my deps to ESM, and with that, esbuild was now able to tree shake the final bundle, reducing my filesize footprint to something 10 times smaller \o/.

ESMify all the things

One of the dependencies was actually one of my own modules, firost, and let me tell you that converting a CommonJS module to ESM is not a trivial task. It's certainly doable, but it does take some time, especially when you have several intertwined modules, some in CommonJS and others in ESM.

I especially had to be careful to use named exports rather than God Objects in my files, to avoid pulling all dependencies with a greedy import. The restructuring of files and import was tedious and long. I also had to ditch Jest (that does not support ESM) in favor of Vitest. I also updated ESLint to its latest version, which finally also supports ESM!

Lodash, you're next

The only dependency I didn't manage to shave off was lodash. I really like lodash, especially the _.chain().value() syntax, which I think makes expressing complex pipelines easier. But lodash still seems to be loaded as a monolithic block, even though I'm only using a few of its methods. I didn't dig too much into how to load it in a more clever way, but that's on my TODO list.

I also needed to include cheerio (because my worker is doing some scraping + HTML extraction), but couldn't find a way to load a leaner alternative (domjs is roughly the same size, and I prefer the API from cheerio)

Minimal .zshrc for remote servers

Recently, I found myself connecting to remote machines quite often. I have to debug remote servers for work or connect through ssh to an emulation handheld console I just bought (more posts coming on that later).

But I've configured my local zsh so much that when I connect to a bare remote server I feel a bit lost. No colors to differentiate folders and files. No tabbing through completion. Even simple things like backspace or delete key do not always work.

So this time, I built a very minimal .zshrc that I intend to scp to a remote server whenever I need to do some work there, just to make my life easier. I tried it on the aforementioned console, and it helped a lot, so I'm gonna share it here.

Fix the keyboard

export TERM=xterm-256color
bindkey "^[[3~" delete-char            # Delete
bindkey "^?" backward-delete-char # Backspace
bindkey "^[OH" beginning-of-line  # Start of line
bindkey "^[OF" end-of-line             # End of line

Starting with the basics, I ensure my terminal type is set to xterm-256color. It should fix most keyboard issues. But just to be sure I actually did define the keycodes for delete, backspace as well as start and end of line.

The ^[ and ^? chars here are not real characters, but escape characters. In vim, you have to press Ctrl-V, followed by the actual key (so, backspace, or delete, etc) to input it correctly. I found that various servers had this mapped differently, so you might have to manually change it if it doesn't work for you.

Completion

autoload -Uz compinit
compinit
zstyle ':completion:*' menu select

This will enable a much better completion than the default one. Now, whenever there are several possible solutions when pressing tab, the list of possibilities will be displayed, and you can tab through them as they are getting highlighted.

Colors

autoload -U colors && colors
export COLOR_GREEN=2
export COLOR_PURPLE=21
export COLOR_BLUE=94
export LS_COLORS="di=38;5;${COLOR_GREEN}:ow=38;5;${COLOR_GREEN}:ex=4;38;5;${COLOR_PURPLE}:ln=34;4;${COLOR_BLUE}"
zstyle ':completion:*' list-colors ${(s.:.)LS_COLORS}

I then added a bit of color. I defined a few variables to better reference the colors. Those are mapped based on the color palette I'm using in my local kitty. They would probably be different on your machine, so you should also adapt it.

The LS_COLORS definition sets the directories in green, executable files in purple and symlinks in blue. This simple change already makes everything much easier to grok. The zstyle line also applies those colors to the tab completions \o/.

Aliases

alias v='vi'
alias ls='ls -lhN --color=auto'
alias la='ls -lahN --color=auto'
alias ..='cd ..'
function f() {
        find . -name "*$1*" | sed 's|^./||'
}

I added some very minimal aliases; those that are embedded in my muscle memory. I have much more locally, but I went for the minimal amount of aliases to make me feel at home. I also didn't want to have to install any third party (even if exa, fd and bat would sure would have been nice).

v is twice as fast to type as vi. Some better ls and la (for hidden files). A quick way to move back one level in the tree structure, and a short alias to find files based on a pattern. Those are simple, but very effective.

Prompt

PS1="[%m] %{[38;5;${COLOR_GREEN}m%}%~/%{[00m%} "

And finally a left-side prompt to give more information of where I am. It starts with the name of the current machine, so I can easily spot if I'm on a remote session or locally, then the current directory (in green, once again, as is my rule for directories).

The wrapping %{ and %} are needed around color espace sequences, to tell zsh that what's inside doesn't take any space on the screen. If you omit them, you'll see that what you type is offset on the right by a large amount.

I actually like to replace the %m with a machine-specific prefix, to more easily see where I am.

Here, for example, you can see I'm connected to my handheld console (I added the SNES-like colored button), currently in the /roms2/ directory and I'm tabbing through completions in the ./n64/games/ folder.

And that's it. A very minimal .zshrc for when I need to get my bearings on a new remote server and still be able to do what I want quickly.

Make webhook status visual feedback in Airtable

I've been working on my Airtable and Make automations, and I wanted to share a small trick I've implemented to improve error handling and visibility.

I use Airtable interfaces equipped with various buttons that, when clicked, trigger webhooks in Make, that in turn update my Airtable record with additional data. However, I found it quite frustrating that there was no visual feedback indicating the status of the webhook. Was it still running? Did it succeed? Did it fail? I had to open my Make dashboard each time, which was far from ideal.

So I added a simple status feature. I added a new field called automationStatus to my table. It's a select with three states: OK (default), In Progress, or Error. By default, it's set to OK.

Now, when the webhook starts, instead of fetching the relevant Airtable record, it updates it instead by changing its automationStatus to In Progress. The Make module to update an Airtable record also returns the whole record, so I don't need to actually fetch it. I display the status next to the button, so now I can get visual feedback.

At the end of a successful scenario, when I update the record with new data, I also set the automationStatus back to OK. And if any module fails along the way I add an Error Handler to set the automationStatus to Error.

For some very nasty scenarios, I even went further and added the automationScenarioUrl and automationErrorDetails, so I could have more visibility on what was really happening, and quickly click on the link to get to the Make Scenario page.

This approach of course has limitations (there is no history, and all automations of a given record share the same field), but it is already way better than what I used to have (ie. nothing) before.

Counting Elements in Airtable

Airtable is an impressive tool that keeps surprising me with its power. But sometimes, it seems to be missing what I assume would be very basic features.

For example, it has powerful linking capabilities; let's say you have a companies table and an employees table, you can have a company field in employees that automatically allows selecting an existing company. The mirroring effect is automatically enabled as well, when you look at your company table, you can see all its linked employees.

You can even use a Rollup field to, for example, gather all names of the linked employees. By default it's displayed as a comma-separated list of values, but in reality it returns an array-like structure, and you can call a bunch of array-related methods on it, like ARRAYUNIQUE, ARRAYJOIN, ARRAYCOMPACT or ARRAYSLICE.

But I couldn't find a way to get the length of the array. There is no ARRAYLENGTH method.

Edit: Seems like there is a Count type that does exactly that. It was there, in plain sight, and I never saw it. It's a much better solution than the hack I'm describing here.

The workaround

Still, I found a clever / hackish way to get that information, by using a mix of string and regexp functions. Let me walk you through it:

First, I define a UUID field in the employees table, a formula that only contains RECORD_ID(). That way, I have a relatively short and very unique identifier for each employee.

Now, let's see what we need to do in the companies table. It is going to be a pretty long and complex formula, so we'll go step by step. You could technically put the very long formula in the Rollup, but to make it clearer, I'll create several fields, and reference them.

First, I create a an employeeCountStep1 Rollup of the UUID field of the employees field, but instead of joining it with the default , separator, I'll use a less common character, like . I could have used any character, but I find this blocky square to be more visible. So, this is our Rollup formula for now: ARRAYJOIN(ARRAYUNIQUE(values), "▮").

Now, as this returns a string, we'll be able to execute some regexp search and replace on it. What we'll do is remove all characters, except our blocky squares. Essentially it means replacing everything that is not , with an empty string, like this: REGEX_REPLACE(employeeCountStep1, "[^▮]", "")

The last step is to count the number of we now have, using the LEN method. But as the only separates values, we'll have an off-by-one error; if we have 3 employees, we'll only have two . No problem, we just need to add 1 to the total… But that means that it will also add 1 even if there are no employees… Ok, so let's wrap that in one final condition to handle that edge-case: IF(employees = "", 0, LEN(employeeCountStep2) + 1)

All in all, here is the final Rollup formula: IF(employees = "", 0, LEN(REGEX_REPLACE(ARRAYJOIN(ARRAYUNIQUE(values), "▮"), "[^▮]", "")) + 1)

Well, that wasn't easy, but it works. It's a bit hackish, but it does the job. It's a neat trick to have in your Airtable toolkit.