创建基于旧数据框的数据框

时间:2019-06-11 13:11:56

标签: python-2.7

我的数据框为:

   A   B   C   D
   0   s   3   a
   4   s   2   a
   5   s   2   a
   6   s   1   a
   7   s   2   b
   7   s   3   b
   6   s   0   b

如何创建如下的新数据框?

   A   B   C   D
   0   4   8   4-a
   7   3   5   3-b

新数据框通过将列“ D”的元素分组来汇总旧数据框,因此“ A”是索引,“ B”是元素计数,“ C”是元素总和,其中“ D”具有相同的值值。

1 个答案:

答案 0 :(得分:0)

好吧,假设您的数据存储在df中,这是一个多步骤过程,可以这样完成

import pandas as pd

data = {'A': {0: 0, 1: 4, 2: 5, 3: 6, 4: 7, 5: 7, 6: 6},
        'B': {0: 's', 1: 's', 2: 's', 3: 's', 4: 's', 5: 's', 6: 's'},
        'C': {0: 3, 1: 2, 2: 2, 3: 1, 4: 2, 5: 3, 6: 0},
        'D': {0: 'a', 1: 'a', 2: 'a', 3: 'a', 4: 'b', 5: 'b', 6: 'b'}}
df = pd.DataFrame(data)

# Handling column A (first index per value in D)
output_df = df.drop_duplicates(subset='D', keep='first')

# Itering through rows
for index, row in output_df.iterrows():

    #Calcultating the counts in B
    output_df.loc[index, 'B'] = df[df.D == row.D].B.count()

    #Calcultating the sum in C
    output_df.loc[index, 'C'] = df[df.D == row.D].C.sum()

#Finally changing values in D by concatenating values in B and D
output_df.loc[:, 'D'] = output_df.B.map(str) + "-" +  output_df.D

输出:

   A   B   C   D
   0   4   8   4-a
   7   3   5   3-b