def group_by(group_defs: Union[Callable, Tag, Tuple], query: QueryObject, ops: Tuple[Callable, ...] = (count_op,)) -> Dict
This method is experimental and its API may change in future versions.
Perform cached group-and-aggregate queries over a set of objects.
The group_by() method categorizes objects returned by input query based on one or more criteria and then applies aggregation operations to each category.
The first time a specific query is run, it performs a full scan and caches the result. Subsequent calls to the exact same query calculate the result by applying only the changes (deltas) that have occurred since the last run, to improve performance. For the cache to be persistent across program runs, you must initialize it using dbzero.init_fast_query(). A query is considered "identical" if its parameters and its group_defs are the same as from the previous call.
Parameters
-
group_defslambda | Tag | tuple
The criteria used to group the objects. This can be:- A lambda function: Applied to each object to determine its grouping key. For caching to work, the lambda's source code must be identical between calls.
- Tag: To group objects by tags they are tagged with. The group keys will be the string names of the enum members.
- A tuple of the above: For multi-level grouping. The resulting dictionary keys will be tuples.
-
queryQueryObject
A dbzero query to be grouped. -
opstuple of callable, default (count_op,)
A tuple of aggregation operations to perform on each group. Defaults to(dbzero.count_op,), which counts the number of items in each group.
Returns
A dictionary where:
- Keys are the group identifiers determined by the
group_defscriteria. If multiple criteria are used, the key will be a tuple. - Values are the results of the aggregation(s).
- If a single operation is provided in
ops, the value is a single result (e.g., an int). - If multiple operations are provided, the value is a tuple containing the result of each operation in the specified order.
- If a single operation is provided in
Examples
Simple grouping by attribute
Here, we group objects by their key attribute and count the items in each group.
# Assume objects are instances of a class with a 'key' attribute
objects = []
keys = ["one", "two", "three"]
for i in range(10):
objects.append(SomeClass(key=keys[i % 3]))
db0.tags(*objects).add("my-tag")
# Group objects with "my-tag" by their 'key'
groups = db0.group_by(lambda row: row.key, db0.find("my-tag"))
# Example result:
# {'one': 4, 'two': 3, 'three': 3}Multi-level grouping
You can group by multiple criteria, such as an Enum tag and the parity of an object's value. The resulting keys will be tuples.
from enum import Enum
class Colors(Enum):
RED = 1
GREEN = 2
BLUE = 3
# Group by color tag and then by whether the value is even (0) or odd (1)
groups = db0.group_by(
(Colors.values(), lambda x: x.value % 2),
db0.find(MemoTestClass)
)
# Example result:
# {('RED', 0): 2, ('RED', 1): 2, ('GREEN', 1): 3, ('BLUE', 0): 2, ...}Grouping with custom aggregations
Instead of just counting, you can perform other aggregations like summing a value. If you provide multiple operations, the dictionary's values will be tuples.
# Define two operations: default count and a sum of the 'value' attribute
query_ops = (db0.count_op, db0.make_sum(lambda x: x.value))
groups = db0.group_by(
lambda x: "even" if x.value % 2 == 0 else "odd",
db0.find(MemoTestClass),
ops=query_ops
)
# Example result where each value is a tuple (count, sum_of_values):
# {'even': (5, 20), 'odd': (5, 25)}