Databases and distributed systems (learning) Looking for work – GitHub: github.com/AmrHedeiwy

Joined November 2024
amr hedeiwy retweeted
YouTube deleted all my coverage of Israeli soldiers shooting civilians, including children targeted on a live stream, along with my entire account. No community guidelines violated & 3 separate excuses given to me. Then Google deleted my email & won’t respond to appeals.
How it started: Don’t be evil How it’s going: theintercept.com/2025/11/04/…
@jamesacowling I'm certain this was ur idea
1
1
Of course this is sponsored by convex🤣🤣
1
1
4
4th) Think how to layout the pages. You can learn about "Slotted pages" from my notes (the ones u see in the pictures): notion.so/Chapter-3-File-For… Or you can just read the book "Database Internals" by Alex petrov.
3rd) u need a file layout to decide how data will be addressed: - Fixed-size header: metadata - Fixed-size pages: page = metadata and contents (key, value) of 1 node. - Optional fixed-size trailer: mostly for validation.
- boolean require 1 bit: 0 (false) and 1 (true) They're going to take up 1 byte anyway (byte alignment), so it's better to batch booleans together. - enums are just integers. - flags are just a combination of booleans stored as bits in a single integer:
For variable-sized data -> use "Pascal Strings" format = [size][data]
Also, u need to decide on the "byte order":
2nd -> u need to convert the actual data (keys and values) to bytes (serialization). Some difficult things u have to deal with: - Complex structures combining multiple values (objects), cause they're just memory pointers. - variable-sized types (string, arrays).
That layout depends on how YOU insert data in a file. These are some considerations to take into account when designing an on-disk structure:
U get humbled real quick when u learn how to design on-disk structures. It's nothing like when u just use them in memory. You can't just access a B-Tree the same way: Node root = (Node) *ptr_to_root_node; You literally have to do this: FILE *fp = fopen("data-index", "r+");
Finished notes for chapter 5 "Transaction Processing and Recovery". Lots of exciting things to learn about. notion.so/Chapter-5-Transact…
Finished writing notes for chapter 3 "File formats" and chapter 4 "Implementing B-Trees" in Database Internal by Alex Petrov. I'll get chapter 5 done tomorrow. notion.so/Chapter-3-File-For… notion.so/Chapter-4-Implemen…
Finished writing notes for chapter 3 "File formats" and chapter 4 "Implementing B-Trees" in Database Internal by Alex Petrov. I'll get chapter 5 done tomorrow. notion.so/Chapter-3-File-For… notion.so/Chapter-4-Implemen…
Replying to @amr_hedeiwyy
The pictures are from my notes on Database Internals. It's really easy/fast to read and contains a bunch of references from the book (for advanced folks). The first two chapters notes are neatly formatted. I'll get the rest done soon. notion.so/Storage-engines-28…
The pictures are from my notes on Database Internals. It's really easy/fast to read and contains a bunch of references from the book (for advanced folks). The first two chapters notes are neatly formatted. I'll get the rest done soon. notion.so/Storage-engines-28…
Btrees takes these into consideration: Increase node fanout → many children per node Reduce tree height → fewer levels = fewer disk accesses Reduce the number of pointers Reduce rebalancing frequency → since each node can hold many keys, so insertions cause less rebalancing
Minimum read/write unit: - HDD -> 512B-4KB - SSDs -> 2–16 KB Utilize the disk-based access model by grouping related data, which theoretically should improve locality, which in turn -> reduces io.
There's a few things to consider when designing on-disk data structures: - minimize disk io - maximize sequential disk io (i.e., avoid random io) - utilize block-based access model (disks don't read in individual bytes, they read blocks/pages)
Correctness is defined using formal properties. Using fencing token example, properties:
So where do u even start when designing a reliable distributed algorithm? system models + defining what a "correct" algorithm means. -> System models = assumptions an algorithm can make about timing and faults.
Replying to @amr_hedeiwyy
Solution: fencing tokens A unique number stored in the database. So the node with the larger token = the true leader, and the other node is rejected.
Solution: fencing tokens A unique number stored in the database. So the node with the larger token = the true leader, and the other node is rejected.
1
1