Group By – dbzero

`def group_by(group_defs: Union[Callable, Tag, Tuple], query: QueryObject, ops: Tuple[Callable, ...] = (count_op,)) -> Dict`

⚠️

This method is experimental and its API may change in future versions.

Perform cached group-and-aggregate queries over a set of objects.

The group_by() method categorizes objects returned by input query based on one or more criteria and then applies aggregation operations to each category.

The first time a specific query is run, it performs a full scan and caches the result. Subsequent calls to the exact same query calculate the result by applying only the changes (deltas) that have occurred since the last run, to improve performance. For the cache to be persistent across program runs, you must initialize it using dbzero.init_fast_query(). A query is considered "identical" if its parameters and its group_defs are the same as from the previous call.

Parameters

group_defs lambda | Tag | tuple
The criteria used to group the objects. This can be:
- A lambda function: Applied to each object to determine its grouping key. For caching to work, the lambda's source code must be identical between calls.
- Tag: To group objects by tags they are tagged with. The group keys will be the string names of the enum members.
- A tuple of the above: For multi-level grouping. The resulting dictionary keys will be tuples.
query QueryObject
A dbzero query to be grouped.
ops tuple of callable, default (count_op,)
A tuple of aggregation operations to perform on each group. Defaults to (dbzero.count_op,), which counts the number of items in each group.

Returns

A dictionary where:

Keys are the group identifiers determined by the group_defs criteria. If multiple criteria are used, the key will be a tuple.
Values are the results of the aggregation(s).
- If a single operation is provided in ops, the value is a single result (e.g., an int).
- If multiple operations are provided, the value is a tuple containing the result of each operation in the specified order.

Examples

Simple grouping by attribute

Here, we group objects by their key attribute and count the items in each group.

# Assume objects are instances of a class with a 'key' attribute
objects = []
keys = ["one", "two", "three"]
for i in range(10):
    objects.append(SomeClass(key=keys[i % 3]))
db0.tags(*objects).add("my-tag")
 
# Group objects with "my-tag" by their 'key'
groups = db0.group_by(lambda row: row.key, db0.find("my-tag"))
 
# Example result:
# {'one': 4, 'two': 3, 'three': 3}

Multi-level grouping

You can group by multiple criteria, such as an Enum tag and the parity of an object's value. The resulting keys will be tuples.

from enum import Enum
 
class Colors(Enum):
    RED = 1
    GREEN = 2
    BLUE = 3
 
# Group by color tag and then by whether the value is even (0) or odd (1)
groups = db0.group_by(
    (Colors.values(), lambda x: x.value % 2),
    db0.find(MemoTestClass)
)
 
# Example result:
# {('RED', 0): 2, ('RED', 1): 2, ('GREEN', 1): 3, ('BLUE', 0): 2, ...}

Grouping with custom aggregations

Instead of just counting, you can perform other aggregations like summing a value. If you provide multiple operations, the dictionary's values will be tuples.

# Define two operations: default count and a sum of the 'value' attribute
query_ops = (db0.count_op, db0.make_sum(lambda x: x.value))
 
groups = db0.group_by(
    lambda x: "even" if x.value % 2 == 0 else "odd",
    db0.find(MemoTestClass),
    ops=query_ops
)
 
# Example result where each value is a tuple (count, sum_of_values):
# {'even': (5, 20), 'odd': (5, 25)}

getrefcount hash