如何根据另一列中对应值的相似性获得一列值的平均值

时间:2018-12-24 02:48:33

标签: python pandas dataframe mean

如果有人告诉我如何完成下面的任务,我将非常感激。假设我在python中有一个数据框,如下所示:

#include <iostream>

// Creating a Class for the type of phone.
class Phone {
public:
std::string manufacturer;
std::string model;
int capacity; //in GB

// Creating a Constructor. This will be called whenever we create a "Phone" object.
Phone(std::string aManufacturer, std::string aModel, int aCapacity) 
  {
    manufacturer = aManufacturer;
    model = aModel;
    capacity = aCapacity;
  }
};

int main()
{
// Objects
Phone iPhone("Apple", "6s", "64"); // This is where the error occurs
}

如果col1,col2和col3中的对应值相同,我想获取col4的平均值,然后除去前3列中具有重复值的行。例如,第一两列的col1,col2,col3的值是相同的,因此,我们要消除其中一个,并将col4的值更新为5和4的平均值。结果应为:

  col1 col2 col3 col4
0    A 2001    2    5
1    A 2001    2    4
2    A 2001    3    6
3    A 2002    4    5
4    B 2001    2    9
5    B 2001    2    4
6    B 2001    2    3
7    B 2001    3   95

1 个答案:

答案 0 :(得分:1)

使用groupby'col1''col2''col3'分组,然后获取'col4'列的均值:

print(df.groupby(['col1','col2','col3'],as_index=False)['col4'].mean())

输出:

  col1  col2  col3       col4
0    A  2001     2   4.500000
1    A  2001     3   6.000000
2    A  2002     4   5.000000
3    B  2001     2   5.333333
4    B  2001     3  95.000000