1792 1500 1342 1190 1840 1140 1545 1409 1767 1541 1686 1305 1514 1023 1727 1436 1083 1664 1203 1959 1079 1755 1568 1030 1053 1714 1438 1474 1246 1912 1622 1927 1735 1972 1699 1538 1850 1575 1474 1077 1157 1540 1929 1037 1616 1610 1985 1605 1086 1067 1871 1151 1146 1050 1862 1770 1252 1772 1050 1755 1890 1571 1398 1825 1514 1813 1573 1471 1368 1854 1756 1549 1101 1084 1893 1168 1477 1845 1697 1322 1938 1848 1990 1992 1725 1231 1561 1153 1876 1472 1782 1448 1345 1707 1581 1563 1653 1585 1076 GES scavenging and the hidden cost of link events | PHPnews.io

PHPnews.io

GES scavenging and the hidden cost of link events

Written by Jef Claes on software and life / Original link on Aug. 12, 2018

Somewhere around a year ago, we started using GES in production as the primary data store of our new loyalty system. The system stores two types of data.
  1. External services push batches of dumb downed events to the loyalty system. For example: a user logged on, played a game or participated in a competition. These events are transient by nature. Once the loyalty system has processed them, they only need to be kept around for a few days.
  2. When these ingress events are processed, they go through a mini pipeline in which each event is assigned a specific weight, for then to be aggregated and translated to a command. This command awards a user with virtual currency to spend in the loyalty shop and a number of points contributing to a higher rank - unlocking more priviliges. The state machine that stores a user's balance and points is backed by a stream which is stored indefinitely. Unless the user asks to be forgotten that is.
As a rough estimate, for every 1000 ingress transient events, only 1 needs to be stored indefinitely as part of a state machine.
When implementing this more than a year ago, I thought I had done my homework and knew how to make sure the ingress events would get cleaned up. First you make sure the $maxAge metadata is set on the streams you want to clean up, for then to schedule the scavenging process (other databases use the term vacuum). This worked without any surprises. Once scavenging had been run, I could see disk space being released. However, after a few months I started to become a bit suspicious of my understanding of the scavening process. GES was releasing less disk space than I expected.
Even though, we had been quite generous while provisioning the nodes with disk space, we would run out very soon. Much to my frustration, the "Storage is cheap" mantra gets thrown around too lightly. While the statement in essence is not wrong, for a database like GES that's built on top of a log, more data also means slower node restarts (index verification), slower scavenges and slower $all subscriptions.
GES has no built-in system catalog that allows you to discover which streams are taking up all this space. However, you can implement an $all subscription and count events per stream or even count the bytes in the event payload.
Inspecting the results, we found that the streams emitted by the built-in system projections contained a disproportionate amount of events. Most of them were ingress events and should have been long scavenged! As it turns out, this was a wrong assumption on my part. Built-in projections build new streams by linking to the original event (instead of a emitting a new one). But when an event is scavenged, events linking to the original event still linger around on disk. Although link events are much smaller than the original event usually - it's just a pointer, the bytes used to store the pointer and the event envelope still take up quite some space when there's billions of them!
Luckily, I was only using a small portion of the built-in projections. I created a custom projection that only created streams I was actually interested in, pointed my code in the right direction, stopped the built-in projections and deleted the now irrelevant streams.
Running the scavenging process after being able to delete all these streams was very satisfying. The scavenging process loops through all the transaction file chunks one by one. It reads a chunk and writes a temporary new one, only containing the events that haven't been deleted. Once it reaches the end of the file, it swaps out the newly written file with the old one. Since writes are slower than reads, this makes that scavenging is actually way faster when there's more to scavenge - or less data to be written to a new file. After all the chunks have been scavenged, the process merges the now smaller files into new chunks when possible. This process is quite transparant by design; all you have to do is list the files in the data directory when scavenging.
When this whole process was complete, used disk space went down from 410GB to 47GB! Having trimmed all this excessive data, scavenging is faster (hours not days), node restarts are faster and resetting an $all subscription makes me less anxious.

jefclaes

« A week of symfony #606 (6-12 August 2018) - Using Config Files with Ngrok »