我知道有很多帖子,但这不能解决我的问题。
我的数据框是这样的:
df1 = [{"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator" : "k","Money" : 100},
{"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator": "k","Money" : 200},
{"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator" : "D", "Money" : 0}]
df1 = pd.DataFrame(df1)
df1
Account Name Customer Number Debit/Credit Indicator Money
Sunarto AFIMBN01000BCA17030001177 k 100
Sunarto AFIMBN01000BCA17030001177 k 200
Sunarto AFIMBN01000BCA17030001177 D 0
Account Name object
Customer Number object
Debit/Credit Indicator object
Money int64 (or let's say float64)
我想根据“金钱”来计算频率
如果货币为0,则不计算在内。
我尝试过df1["Money"].value_counts()
无效
df1.loc[df1["Money"] != 0, "Per item"] = df1["Money"].value_counts()
df1
Account Name Customer Number Debit/Credit Indicator Money Per item
Sunarto AFIMBN01000BCA17030001177 k 100 1
Sunarto AFIMBN01000BCA17030001177 k 200 NaN
Sunarto AFIMBN01000BCA17030001177 D 0 NaN
但我的期望是
Account Name Customer Number Debit/Credit Indicator Money Per item
Sunarto AFIMBN01000BCA17030001177 k 100 1
Sunarto AFIMBN01000BCA17030001177 k 200 1
Sunarto AFIMBN01000BCA17030001177 D 0 0
因此,当我在数据透视中应用时,我的期望是,我可以获得具有“货币”值的项目
我的期望
gdf = pd.pivot_table(df1, index = ["Account Name","Customer Number"],values = ["Money", "Per item"],aggfunc = np.sum)
gdf.head()
Money Per item
Account Name Customer Number
Sunarto AFIMBN01000BCA17030001177 300 2.0
答案 0 :(得分:2)
您需要为每个条件分配1
:
df1.loc[df1["Money"] != 0, "Per item"] = 1
或将布尔型掩码转换为整数:
df1["Per item"] = (df1["Money"] != 0).astype(int)
另一个没有pivot_table
且具有聚合功能的解决方案:
gdf = (df1.groupby(["Account Name","Customer Number"])['Money']
.agg([('Money','sum'), ('Per item', lambda x: x.ne(0).sum())]))
print (gdf)
Money Per item
Account Name Customer Number
Sunarto AFIMBN01000BCA17030001177 300 2
编辑:
我可以知道为什么我的代码不起作用吗?
问题是Series.value_counts
返回带有计数器值的Series,但是索引值是由原始Series
的值创建的,此处为100, 200
。因此索引不匹配并获得缺失值。解决方法是使用Series.map
:
df1.loc[df1["Money"] != 0, "Per item"] = df1["Money"].map(df1["Money"].value_counts())
print (df1)
Account Name Customer Number Debit/Credit Indicator Money \
0 Sunarto AFIMBN01000BCA17030001177 k 100
1 Sunarto AFIMBN01000BCA17030001177 k 200
2 Sunarto AFIMBN01000BCA17030001177 D 0
Per item
0 1.0
1 1.0
2 NaN
但是,如果有多个重复的值而不是分配1
,而是计数器值并获得错误的输出,这是一个问题,这里的两个200
值错误地返回了4
的值,而不是{{1} }:
2