Question

我想计算数据框中2列的百分比。数据框是：

df1.head()
     ssaname      ym  tch_block  call_drop  cell_name
0  AAAAAAAAA  201504          0         39        345
1  aaaaaaaaa  201505          2         48        291
2  bbbbbbbbb  201506          2         49        360
3  ccccccccc  201507          4         59        357
4  ddddddddd  201508         10         74        363

百分比应为tch_block*100/cell_name
尝试的命令是：

pd.pivot_table(df1,index=['ssaname'],columns=['ym'],values=['tch_block','cell_drop'],aggfunc = lamdba x:x*100/'cell_name')

Answer 1

我不确定pivot_table如何适用于此。

在pivot_table：

中

如果DataFrame中有多行......
在pivot_table的index arg ...
行被聚合成一行。

aggfunc确定如何将多行聚合为一行。

但您的数据透视表有一个索引ssname，并且您的DataFrame在ssname列中没有重复项，因此数据透视表的索引没有多个匹配的行，因此在pivot_table中没有聚合为一个排会发生。你可以在这里看到：

d = {
    'ssaname' : pd.Series(['aaa', 'bbb', 'ccc', 'ddd']),
    #'ym' : pd.Series(np.arange(201504, 201509) ),
    'tch_block' : pd.Series([1, 1, 1, 3,]),
    'call_drop' : pd.Series([10, 10, 10, 30,]),
    #'cell_name' : pd.Series([345, 291, 360, 357, 363])
}

df = pd.DataFrame(d)
print(df)

result = df.pivot_table(
    index = ['ssaname'],
    values = ['tch_block', 'call_drop'],
)

print(result)

 --output:--
   call_drop ssaname  tch_block
0         10     aaa          1
1         10     bbb          1
2         10     ccc          1
3         30     ddd          3

         call_drop  tch_block
ssaname                      
aaa             10          1
bbb             10          1
ccc             10          1
ddd             30          3

现在看看如果指定的索引ssaname有重复项会发生什么：

d = {
    'ssaname' : pd.Series(['aaa', 'bbb', 'ccc', 'aaa']),
    #'ym' : pd.Series(np.arange(201504, 201509) ),
    'tch_block' : pd.Series([1, 1, 1, 3,]),
    'call_drop' : pd.Series([10, 10, 10, 30,]),
    #'cell_name' : pd.Series([345, 291, 360, 357, 363])
}

df = pd.DataFrame(d)
print(df)

result = df.pivot_table(
    index = ['ssaname'],
    values = ['tch_block', 'call_drop'],
)

print(result)

--output:--
   call_drop ssaname  tch_block
0         10     aaa          1
1         10     bbb          1
2         10     ccc          1
3         30     aaa          3

         call_drop  tch_block
ssaname                      
aaa             20          2
bbb             10          1
ccc             10          1

现在，pivot_table的aaa行中有聚合，因为原始DataFrame中有两个aaa行：

       call_drop ssaname  tch_block
    0         10     aaa          1
    ...
    ...
    3         30     aaa          3

多行是垂直聚合的 - 换句话说，aggfunc会上下行，而不是跨行：

       call_drop ssaname  tch_block
    0         10     aaa          1
              ^                   ^                  
              |                   |
           aggfunc            aggfunc
              |                   |            
              V                   V
    3         30     aaa          3

默认情况下，pivot_table使用np.mean聚合多行，平均值10和30为20，平均值为1和3为2，因此您在pivot_table中获得以下行：

         call_drop  tch_block
ssaname                      
aaa             20          2

您可以指定其他聚合函数：

d = {
    'ssaname' : pd.Series(['aaa', 'bbb', 'ccc', 'aaa']),
    #'ym' : pd.Series(np.arange(201504, 201509) ),
    'tch_block' : pd.Series([1, 1, 1, 3,]),
    'call_drop' : pd.Series([10, 10, 10, 30,]),
    #'cell_name' : pd.Series([345, 291, 360, 357, 363])
}

df = pd.DataFrame(d)
print(df)

result = df.pivot_table(
    index = ['ssaname'],
    values = ['tch_block', 'call_drop'],
    aggfunc = np.sum  #****HERE****
)

print(result)

--output:--
   call_drop ssaname  tch_block
0         10     aaa          1
1         10     bbb          1
2         10     ccc          1
3         30     aaa          3

         call_drop  tch_block
ssaname                      
aaa             40          4
bbb             10          1
ccc             10          1

但aggfunc仅适用于DataFrame中与您在pivot_table中指定的索引匹配的多行。

这是一个示例，其中pivot_table指定索引的多个列：

d = {
    'ssaname' : pd.Series(['aaa', 'bbb', 'ccc', 'aaa', 'aaa']),
    #'ym' : pd.Series(np.arange(201504, 201509) ),
    'tch_block' : pd.Series([1, 1, 1, 1, 100]),
    'call_drop' : pd.Series([10, 10, 10, 30, 10]),
    #'cell_name' : pd.Series([345, 291, 360, 357, 363])
}

df = pd.DataFrame(d)
print(df)

result = df.pivot_table(
    index = ['ssaname', 'tch_block'],
    values = ['call_drop'],
    aggfunc = np.sum

)

print(result)

--output:--
   call_drop ssaname  tch_block
0         10     aaa          1
1         10     bbb          1
2         10     ccc          1
3         30     aaa          1
4         10     aaa        100
                   call_drop
ssaname tch_block           
aaa     1                 40
        100               10
bbb     1                 10
ccc     1                 10

在原始DataFrame中，有两行，其指定为pivot_table的索引ssaname和tch_block的列的值相同;因此，他们的数据汇总到与索引相对的一行：aaa 1。输出无需列出aaa两次，但结果确实如此：

ssaname tch_block           
aaa     1                 40
aaa     100               10
bbb     1                 10
ccc     1                 10

熊猫数据透视表：计算两列的百分比

1 个答案: