这有点难以解释所以请耐心等待。
假设我有一张表,如下所示
如何创建符合以下条件的新数据框
答案 0 :(得分:3)
您可以使用cut
获取范围,然后将其提供给pivot_table
以获取总和:
# Setup example data.
np.random.seed([3, 1415])
n = 100
df = pd.DataFrame({
'A': np.random.randint(200, 601, size=n),
'B': np.random.randint(1, 101, size=n),
'C': np.random.randint(25, size=n)
})
# Use cut to get the ranges.
a_bins = pd.cut(df['A'], bins=[200, 311, 370, 450, 550, 600], include_lowest=True)
b_bins = pd.cut(df['B'], bins=[1, 16, 67, 100], include_lowest=True)
# Pivot to get the sums.
df2 = df.pivot_table(index=a_bins, columns=b_bins, values='C', aggfunc='sum', fill_value=0)
结果输出:
B [1, 16] (16, 67] (67, 100]
A
[200, 311] 82 118 153
(311, 370] 68 56 45
(370, 450] 41 129 40
(450, 550] 32 121 57
(550, 600] 0 112 47
答案 1 :(得分:1)
我真的很喜欢@root's solution!这是一个略微修改的单行版本,它使用pd.crosstab方法:
In [102]: pd.crosstab(
...: pd.cut(df['A'], bins=[200, 311, 370, 450, 550, 600], include_lowest=True),
...: pd.cut(df['B'], bins=[1, 16, 67, 100], include_lowest=True),
...: df['C'],
...: aggfunc='sum'
...: )
...:
Out[102]:
B [1, 16] (16, 67] (67, 100]
A
[200, 311] 31 157 117
(311, 370] 23 90 38
(370, 450] 110 168 60
(450, 550] 37 117 115
(550, 600] 35 19 49