1919 1420 1851 1099 1748 1072 1939 1854 1641 1524 1981 1821 1697 1594 1708 1381 1721 1032 1410 1889 1786 1919 1402 1198 1017 1633 1807 1414 1046 1292 1714 1354 1462 1430 1319 1720 1991 1394 1644 1324 1340 1790 1053 1359 1399 1487 1567 1971 1593 1404 1036 1315 1086 1723 1429 1834 1626 1861 1300 1174 1056 1461 1807 1460 1389 1573 1738 1581 1295 1654 1679 1562 1822 1619 1939 1307 1067 1770 1221 1728 1603 1718 1306 1723 1952 1931 1908 1659 1064 1605 1793 1956 1082 1746 1314 1640 1822 1647 1868 Fast projections | PHPnews.io


Fast projections

Written by Jef Claes on software and life / Original link on Jul. 30, 2017

Most EventStore client libraries allow you to subscribe to a stream by passing in a callback which is invoked when an event occurs (either a live or historic event).
Let's say we subscribe to a stream of a popular video service, and we want to project a read model that shows how many videos a viewer has watched. We don't care about the bookmarked videos for now.
We're sitting on top of storage that can execute a single statement and a batch of statements.
The statements supported are limited:

The storage engine exposes a method which calculates the cost of executed statements:
The stream
For this exercise the stream contains 3500 historic views, 50 historic bookmarks and 100 live views.
First attempt
The first attempt at projecting the stream to state, executes a statement for each event we're interested in and checkpoints after each event (even the ones we're not interested in).
The cost of this projection is high: 7250 execution units - even though there are only 3600 events we're interested in. We execute a statement for each event we handled and checkpoint immediately after, even for the events we didn't handle.
Less checkpointing
It's not hard to get rid of some of the checkpointing though.
The cost has improved, but only marginally. We saved 50 execution units by avoiding checkpointing after events we do not handle. Time for a bigger improvement..
Instead of handling each event individually, we will buffer them as soon as they come in. When we're catching up and seeing historic events, we only flush the buffer every 100 events. When we're caught up, we flush on each event. We want to always make a best attempt at showing fresh data.
When the buffer gets flushed, events are mapped into a sequence of statements, which are sent in batch to the storage engine. The checkpoint is appended to the tail of the batch.
This approach makes a significant difference. Execution cost has reduced by 93%! Batching of historic events makes replays much faster, but with some extra effort we can take this optimization even further.
Batching with transformation
It always pays off to understand the guarantees and intricacies of the storage you're using. Looking closely at the storage interface, we find that we can increment the view count by any number. If we use a local data structure to aggregate the view count up front, we can reduce the number of statements even further.
In practice, we filter the for events we're interested in, group by the viewer id, count the values and map that into a single statement per viewer.
This further reduces costs by more than 2/3th. The optimization makes the code a bit more elaborate, but not necessarily that more complex - it's still a local optimization.
In three steps, we brought cost down from 7250 execution units to only 162 units. That makes me a 44x engineer, right?
In general, storage is one of the slowest components of your system. Making your system faster often involves making it do less work. Avoiding waste by batching and some more work up front, can make a big impact when you want to make your projection faster.
You can find the complete F# script here.


« A week of symfony #552 (24-30 July 2017) - A recap of Laracon US 2017 »