计算不等于(!=)的float64或int64的频率

时间:2019-06-17 10:16:09

标签: python pandas

我知道有很多帖子,但这不能解决我的问题。

我的数据框是这样的:

df1 = [{"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator" : "k","Money" : 100},
    {"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator": "k","Money" : 200},
    {"Customer Number": "AFIMBN01000BCA17030001177", "Account Name": "Sunarto","Debit/Credit Indicator" : "D", "Money" : 0}]
df1 = pd.DataFrame(df1)
df1

Account Name    Customer Number           Debit/Credit Indicator         Money
Sunarto      AFIMBN01000BCA17030001177       k                            100
Sunarto      AFIMBN01000BCA17030001177       k                            200
Sunarto      AFIMBN01000BCA17030001177       D                             0

Account Name              object
Customer Number           object
Debit/Credit Indicator    object
Money                      int64 (or let's say float64)

我想根据“金钱”来计算频率

如果货币为0,则不计算在内。

我尝试过df1["Money"].value_counts()无效

df1.loc[df1["Money"] != 0, "Per item"] = df1["Money"].value_counts()
df1

Account Name    Customer Number           Debit/Credit Indicator         Money   Per item
Sunarto      AFIMBN01000BCA17030001177       k                            100     1
Sunarto      AFIMBN01000BCA17030001177       k                            200    NaN
Sunarto      AFIMBN01000BCA17030001177       D                             0   NaN

但我的期望是

Account Name    Customer Number           Debit/Credit Indicator         Money   Per item
Sunarto      AFIMBN01000BCA17030001177       k                            100     1
Sunarto      AFIMBN01000BCA17030001177       k                            200    1
Sunarto      AFIMBN01000BCA17030001177       D                             0   0

因此,当我在数据透视中应用时,我的期望是,我可以获得具有“货币”值的项目

我的期望

gdf = pd.pivot_table(df1, index = ["Account Name","Customer Number"],values = ["Money", "Per item"],aggfunc = np.sum)

gdf.head()

                                                Money              Per item
Account Name      Customer Number
Sunarto           AFIMBN01000BCA17030001177     300                2.0

1 个答案:

答案 0 :(得分:2)

您需要为每个条件分配1

df1.loc[df1["Money"] != 0, "Per item"] = 1

或将布尔型掩码转换为整数:

df1["Per item"] = (df1["Money"] != 0).astype(int)

另一个没有pivot_table且具有聚合功能的解决方案:

gdf = (df1.groupby(["Account Name","Customer Number"])['Money']
          .agg([('Money','sum'), ('Per item', lambda x: x.ne(0).sum())]))
print (gdf)
                                        Money  Per item
Account Name Customer Number                           
Sunarto      AFIMBN01000BCA17030001177    300         2

编辑:

  

我可以知道为什么我的代码不起作用吗?

问题是Series.value_counts返回带有计数器值的Series,但是索引值是由原始Series的值创建的,此处为100, 200。因此索引不匹配并获得缺失值。解决方法是使用Series.map

df1.loc[df1["Money"] != 0, "Per item"] = df1["Money"].map(df1["Money"].value_counts())
print (df1)
  Account Name            Customer Number Debit/Credit Indicator  Money  \
0      Sunarto  AFIMBN01000BCA17030001177                      k    100   
1      Sunarto  AFIMBN01000BCA17030001177                      k    200   
2      Sunarto  AFIMBN01000BCA17030001177                      D      0   

   Per item  
0       1.0  
1       1.0  
2       NaN  

但是,如果有多个重复的值而不是分配1,而是计数器值并获得错误的输出,这是一个问题,这里的两个200值错误地返回了4的值,而不是{{1} }:

2