Notes on Distributed Systems

Fly.io Gossip groomers

I am working through this challenge and learning the concepts as I solve each of them.

Challenge 1:

What did you learn?

This is just the basics. How maelstrom works, the node system and how to run it. Basically maelstrom is a simulation system already written which we will use throughout to learn distributed systems.

Challenge 2: Unique ID generation

What’s the Goal of the challenge?

The goal is to return a unique Id across different nodes of a distributed system.

What did you learn?

The simplest thing I came up with was generating a unique UUID and returning it. But that might not always help with the requirements.

There can still be conflicts.
There is no way to trace what node generated this. That is we cant trace back to the origin. Many little things like that. In a much broader sense these are the things we need to take care of.

Global uniqueness: Ensuring this in a distributed system is harder than in a single node system. Simply using incrementing numbers is not possible.
Clock synchronisation: Simply relying on time for this is not helpful. Different nodes might not be perfectly synced and might result in duplicates
Scalability: As node numbers increase your solution should work
Coordination overhead: Avoid adding load to one central system. This is terrible
Failure handling: Your system should continue to generate unique IDs even if there is a single node failure
Idempotency: In distributed systems, operations may be retried. Generating IDs should be idempotent - repeated requests should not cause issues.
Load Distribution: The ID generation should not create hotspots or uneven load across your system.
Observability: It’s important to be able to debug and trace IDs back to their origin, which node-prefixed UUIDs can help with.

So what we used was a combination of unique Id, a counter and timestamp. This ensures all the above criterion.

Things I used with go that are helpful:

The solution I gave for this is combination of node Id ( helps trace back to the origin), counter ( helps handle concurrent requests), timestamp(same helps handle concurrent requests)

Go routines: So the handler function I am using for return in a go routine so maelstrom has single binary but multiple go routines.
Shared memory: The variable counter that I am using is outside the handler and is basically a shared memory between different go routines.
Atomic operations: The atomic operations is crucial to ensure that even if multiple go routines are using the same counter simultaneously each increment is safe without race conditions.
Handler execution: When a request comes maelstrom framework calls the handler function in a new go routine. This means that multiple instances of the handler can run concurrently, each in its own go routine. However they all share the same counter variable in memory.

Alternatives:

For the counter I could be using a mutex(locks) or semaphores to achieve thread safe access to the shared counter.