当<div class="panel-container">
<div class="panel-container-left">
<div>LEFT AREA</div>
<div class="panel-left-data">
<div class="panel-left-data-item">
ITEM 1
</div>
<div class="panel-left-data-item">
ITEM 2
</div>
<div class="panel-left-data-item">
ITEM 3
</div>
<div class="panel-left-data-item">
ITEM 4
</div>
<div class="panel-left-data-item">
ITEM 5
</div>
</div>
</div>
<div class="panel-container-right">
<div>RIGHT AREA</div>
<div class="panel-right-data">
<div class="panel-right-data-item">
COMMAND 1
</div>
<div class="panel-right-data-item">
COMMAND 2
</div>
<div class="panel-right-data-item">
COMMAND 3
</div>
<div class="panel-right-data-item">
COMMAND 4
</div>
<div class="panel-right-data-item">
COMMAND 5
</div>
<div class="panel-right-data-item">
COMMAND 6
</div>
<div class="panel-right-data-item">
COMMAND 7
</div>
<div class="panel-right-data-item">
COMMAND 8
</div>
<div class="panel-right-data-item">
COMMAND 9
</div>
</div>
</div>
</div>
/ groupby
对象的项目/行都属于一个组时,熊猫的Series
方法非常有用。但我遇到的情况是,每一行都可以属于零个,一个或多个组。
带有一些假设数据的示例:
DataFrame
根据“标签”列,苹果和番茄都属于两个组,马铃薯不属于任何组,橙色属于一个组。因此,按标签分组并汇总每个标签的计数应得出:
+--------+-------+----------------------+
| Item | Count | Tags |
+--------+-------+----------------------+
| Apple | 5 | ['fruit', 'red'] |
| Tomato | 10 | ['vegetable', 'red'] |
| Potato | 3 | [] |
| Orange | 20 | ['fruit'] |
+--------+-------+----------------------+
该操作如何完成?
答案 0 :(得分:2)
'Count'
列的长度扩展为'Tags'
df.Count.repeat(df.Tags.str.len()).groupby(np.concatenate(df.Tags)).sum()
fruit 25
red 15
vegetable 10
Name: Count, dtype: int64
numpy.bincount
和pandas.factorize
i, r = pd.factorize(np.concatenate(df.Tags))
c = np.bincount(i, df.Count.repeat(df.Tags.str.len()))
pd.Series(c.astype(df.Count.dtype), r)
fruit 25
red 15
vegetable 10
dtype: int64
from collections import defaultdict
import pandas as pd
counts = [5, 10, 3, 20]
tags = [['fruit', 'red'], ['vegetable', 'red'], [], ['fruit']]
d = defaultdict(int)
for c, T in zip(counts, tags):
for t in T:
d[t] += c
print(pd.Series(d))
print()
print(pd.DataFrame([*d.items()], columns=['Tag', 'Count']))
fruit 25
red 15
vegetable 10
dtype: int64
Tag Count
0 fruit 25
1 red 15
2 vegetable 10
答案 1 :(得分:1)
我通过编写一个名为+-----------+-------+
| Tag | Count |
+-----------+-------+
| fruit | 25 |
| red | 15 |
| vegetable | 10 |
+-----------+-------+
的函数解决了这个问题。它同时适用于groupby_many
和Series
对象:
DataFrame
它通过创建一个数据版本来工作,其中每行重复 n 次,其中 n 是该行所属的组数。该版本中的每一行仅属于一个组,因此现在可以由常规import numpy as np
import pandas as pd
def groupby_many(data, groups):
"""
Groups a Series or DataFrame object where each row can belong to many groups.
Parameters
----------
data : Series or DataFrame
The data to group
groups : iterable of iterables
For each row in data, the groups that row belongs to.
A row can belong to zero, one, or multiple groups.
Returns
-------
A GroupBy object
"""
pairs = [(i, g) for (i, gg) in enumerate(groups) for g in gg]
row, group = zip(*pairs)
return data.iloc[list(row)].groupby(list(group))
处理。
要对问题中的示例数据进行实际操作:
groupby