Notes on Distributed Systems

Fly.io Gossip groomers

I am working through this challenge and learning the concepts as I solve each of them.

Challenge 1:

What did you learn?

This is just the basics. How maelstrom works, the node system and how to run it. Basically maelstrom is a simulation system already written which we will use throughout to learn distributed systems.

Challenge 2: Unique ID generation

What’s the Goal of the challenge?

The goal is to return a unique Id across different nodes of a distributed system.

What did you learn?

The simplest thing I came up with was generating a unique UUID and returning it. But that might not always help with the requirements.

  1. Global uniqueness: Ensuring this in a distributed system is harder than in a single node system. Simply using incrementing numbers is not possible.
  2. Clock synchronisation: Simply relying on time for this is not helpful. Different nodes might not be perfectly synced and might result in duplicates
  3. Scalability: As node numbers increase your solution should work
  4. Coordination overhead: Avoid adding load to one central system. This is terrible
  5. Failure handling: Your system should continue to generate unique IDs even if there is a single node failure
  6. Idempotency: In distributed systems, operations may be retried. Generating IDs should be idempotent - repeated requests should not cause issues.
  7. Load Distribution: The ID generation should not create hotspots or uneven load across your system.
  8. Observability: It’s important to be able to debug and trace IDs back to their origin, which node-prefixed UUIDs can help with.

So what we used was a combination of unique Id, a counter and timestamp. This ensures all the above criterion.

Things I used with go that are helpful:

The solution I gave for this is combination of node Id ( helps trace back to the origin), counter ( helps handle concurrent requests), timestamp(same helps handle concurrent requests)

Alternatives: