我的数据框包含很多项目。
这些物品由代码“类型”和重量来标识。
最后一列表示数量。
|-|------|------|---------|
| | type |weight|quantity |
|-|------|------|---------|
|0|100010| 3 | 456 |
|1|100010| 1 | 159 |
|2|100010| 5 | 735 |
|3|100024| 3 | 153 |
|4|100024| 7 | 175 |
|5|100024| 1 | 759 |
|-|------|------|---------|
如果满足以下条件,则给定项目“ A”与其他项目“兼容”:
我想添加一列“兼容数量”,用于计算每行兼容的项目。
|-|------|------|---------|---------------------|
| | type |weight|quantity | compatible quantity |
|-|------|------|---------|---------------------|
|0|100010| 3 | 456 | 615 | 456 + 159
|1|100010| 1 | 159 | 159 | 159 only (the lightest items)
|2|100010| 5 | 735 | 1350 | 735 + 159 + 456 (the heaviest)
|3|100024| 3 | 153 | 912 | 153 + 759
|4|100024| 7 | 175 | 1087 | ...
|5|100024| 1 | 759 | 759 | ...
|-|------|------|---------|---------------------|
我想避免使用For循环ti获得此结果。 (数据框很大)。
import pandas as pd
df = pd.DataFrame([[100010, 3, 456],[100010, 1, 159],[100010, 5, 735], [100024, 3, 153], [100024, 7, 175], [100024, 1, 759]],columns = ["type", "weight", "quantity"])
print(df)
for inc in range(df["type"].count()):
the_type = df["type"].iloc[inc]
the_weight = df["weight"].iloc[inc]
the_quantity = df["quantity"].iloc[inc]
df.at[inc,"quantity_compatible"] = df.loc[(df["type"] == the_type) & (df["weight"] <= the_weight),"quantity"].sum()
print(df)
答案 0 :(得分:1)
首先按weight
和type
对值进行排序,然后对groupby
进行cumsum
,最后对索引进行合并:
df = pd.DataFrame([[100010, 3, 456],[100010, 1, 159],[100010, 5, 735], [100024, 3, 153], [100024, 7, 175], [100024, 1, 759]],columns = ["type", "weight", "quantity"])
new_df = df.merge(df.sort_values(["type","weight"])
.groupby("type")["quantity"]
.cumsum(),left_index=True, right_index=True)
print (new_df)
#
type weight quantity_x quantity_y
0 100010 3 456 615
1 100010 1 159 159
2 100010 5 735 1350
3 100024 3 153 912
4 100024 7 175 1087
5 100024 1 759 759
答案 1 :(得分:0)
尝试一下。
import pandas as pd
from io import StringIO
s = """
type weight quantity
0 100010 3 456
1 100010 1 159
2 100010 5 735
3 100024 3 153
4 100024 7 175
5 100024 1 759
"""
def process_dataframe(df, sort_values_by_init_index = True):
df2 = df.groupby(by=['type','weight']).sum().reset_index()
df3 = df.groupby(by=['type','weight']).sum().groupby(level=[0], as_index=False)['quantity_compatible'].cumsum().reset_index()
df2['quantity_compatible'] = df3['quantity_compatible'].tolist()
if sort_values_by_init_index:
df2 = df2.sort_values('index')
#print(df2)
df2 = df2.reset_index().drop(columns=['index'])
return df2
df = pd.read_csv(StringIO(s), sep='\t')
df.drop(columns='Unnamed: 0', inplace=True)
df['quantity_compatible'] = df['quantity'].copy()
df = df.reset_index()
# custom function
process_dataframe(df)