使用pandas计算列

时间:2016-05-19 15:40:54

标签: python excel pandas

我有excel

ID  digit   count
03a63f1c5a89fb89fcc4d7cf60e2e6b1    131826356   1
1ea192ddd5c042d71910de18595553a5    100897602   5
1ea192ddd5c042d71910de18595553a5    123581809   7
1ea192ddd5c042d71910de18595553a5    137961455   1
1ea192ddd5c042d71910de18595553a5    163217715   4
1ea192ddd5c042d71910de18595553a5    164622895   3
206dcb0a5bfbdbedb77b6f25bbb4b54b    139156767   12
21a27ac254d8b2fe0a52d052bbbd14a5    267483648   1

我需要sum所有countID,然后将每count除以此值。 我试试

df = pd.read_excel("count_video.xlsx")
group = df.groupby('ID')['count'].sum()

但我不知道,我怎么能在下一次使用它呢?

1 个答案:

答案 0 :(得分:1)

您可以使用lambda按照分组sum划分每个计数值:

In [155]:
df['count/sum'] = df.groupby('ID')['count'].transform(lambda x: x/x.sum())
df

Out[155]:
                                 ID      digit  count  count/sum
0  03a63f1c5a89fb89fcc4d7cf60e2e6b1  131826356      1       1.00
1  1ea192ddd5c042d71910de18595553a5  100897602      5       0.25
2  1ea192ddd5c042d71910de18595553a5  123581809      7       0.35
3  1ea192ddd5c042d71910de18595553a5  137961455      1       0.05
4  1ea192ddd5c042d71910de18595553a5  163217715      4       0.20
5  1ea192ddd5c042d71910de18595553a5  164622895      3       0.15
6  206dcb0a5bfbdbedb77b6f25bbb4b54b  139156767     12       1.00
7  21a27ac254d8b2fe0a52d052bbbd14a5  267483648      1       1.00