The bug requires lots of conditions in order to reproduce:
1. blocking operations on multiple databases
2. use of lua scripts that wake blocking transactions
The bug was discovered due to a deadlock in BLMOVE but could also manifest with other commands that would
"disappear" causing local starvation effects on the connections sending them.
With BLMOVE it causes a global deadlock in the transaction queue in dragonfly.
The fix is actually deleting a few lines of code introduced by #3260 from 6 months ago,
so it is actually a long lived regression.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
fix(search_family): Support boolean and nullable types in indexes (#4314)
* fix(search_family): Support boolean and nullable types in indexes
fixes dragonflydb#4107, dragonflydb#4129
* refactor: address comments
---------
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
The bug was caused by incorrect handling of corner cases,
when a path that should lead to an item was wrongfully cleared, which lead
to empty results for SortedMap::GetRange query.
This PR:
1. fixes the wrong code in bptree_set.h.
2. Adds unit tests for both bptree_set_test and sorted_map_test.
Related to https://github.com/mastodon/mastodon/issues/33805
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: Add Lua force atomicity flag
We accidentally instructed Sidekiq to add `disable-atomicity` to their
script, despite them needing to run atomically.
This hack-ish PR adds a `--lua_force_atomicity_shas` flag to allow
specifying which SHAs are forced to run in an atomic fashion, even if
they are marked as non-atomic.
Fixes#4522
* fix build on clang
fix: Do not bump elements during RDB load #4507
The Issue
Before this PR, when loading an RDB, we modified fetched_items_ as part of the loading process. This has little effect, unless the next issued command calls FLUSHALL / FLUSHDB (could happen in DFLY LOAD, REPLICAOF or just calling FLUSHALL directly). In such a case, a CHECK() fails.
The Fix
While load is not run as a command (in a transaction), it still uses APIs that assume that they are called in the context of a command. As such, it indirectly used DbSlice::FindInternal(), which bumps elements when called.
This PR adds another sub-mode to DbSlice, named load_in_progress_. When true, we treat DbSlice as if it is not in cache mode, ignoring cache_mode_.
BTW this PR also renames caching_mode_ to cache_mode_ as we generally use the term cache mode and not caching mode, including in the --cache_mode flag.
Fixes#4497
Due to a corner-case bug, reply builder could add \0\0 to the end of bulk strings, instead of
\r\n. The bug slipped our tests because redis-py parser most probably does not validate the ending
as long as everything else is consistent.
This PR:
1. Adds a test that catches the bug
2. Adds a debug check that verifies the destination pointer is consistent with the iovec being used.
3. Fixes the bug.
Fixes#4424
ExternalAllocator allocates large sizes directly from the extent tree bypassing segment data structure.
Unfortunately, we forgot to align Free() the same way. This PR:
1. Make sure that we add back the allocated range to the extent tree in Free.
2. Rewrite and simplify ExtentTree::Add implementation.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
The problem was with reverse iterator that was not set properly
when the last node is deleted.
Also, move PushSentinel code into Push.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
chore: add content-type for metrics response.
Also, update the local stack to use prometheus 3.0
Finally, hex-escape arguments when logging an error for a command.
Fixes#4277
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
fix: do not check-fail OpRestore
In some rare cases we reach inconsistent state inside OpRestore where a key already exists, though it should not.
In that case log the error instead of crashing the server. In addition, we update the existing entry to the latest restored value.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: add ability to track connections stuck at send
Add send_delay_seconds/send_delay_ms metrics.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Signed-off-by: Roman Gershman <romange@gmail.com>
Co-authored-by: Shahar Mike <chakaz@users.noreply.github.com>
Before: if socket data arrived in small bits, then CheckForHttpProto would grow
io_buf_ capacity exponentially with each iteration. For example, test_match_http test
easily causes OOM.
This PR ensures that there is always a buffer available - but it grows linearly with the input size.
Currently, the total input in CheckForHttpProto is limited to 1024.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: factor out rdb_load utilities into separate files
rdb_load.cc is huge and contains many auxillary classes.
This PR moves DecompressImpl and ErrorRdb code into detail/
It also fixes minor bugs around error conditions with de-compression:
a. Do not check-fail on invalid opcode and return error_code instead.
b. Print correctly LZ4 errors.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: fixes
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: improve parser state machine
1. Separate argument type parsing from argument parsing itself.
2. Handle strings of length 1.
This is done in preparation of improving the parser contract -
so that when it returns INPUT_PENDING, it consumes the entire input.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>