将DataFrame列添加到基于其他列实例的分组

时间:2019-07-05 16:54:45

标签: python pandas csv

我刚开始使用Pandas,我正努力添加一个简单的列,该列添加字符串组+列上唯一值的出现次数。

我尝试使用groupby,但是我不知道如何根据列名添加数字

import pandas as pd

data = pd.read_csv('./data.csv')
data['group'] = data.groupby('name') # ??? 
   name  color
0  car   white
1  car   black
2  car   red
3  bus   white
4  bus   black
5  bus   red

它应该看起来像这样

   name  color  group
0  car   white  group1
1  car   black  group1
2  car   red    group1
3  bus   white  group2
4  bus   black  group2
5  bus   red    group2

3 个答案:

答案 0 :(得分:4)

在此处使用factorize()

df=df.assign(group=(pd.factorize(df.name)[0]+1))

  name  color  group
0  car  white      1
1  car  black      1
2  car    red      1
3  bus  white      2
4  bus  black      2
5  bus    red      2

答案 1 :(得分:2)

具有pandas.core.groupby.GroupBy.ngroup功能:

import math
import operator

class functionClass:

    functions = {0: math.sin, 1: math.cos, 2: math.tan, 3: math.exp, 4: 'identity'}


    def __init__(self,option_code=0,x=0):
        self._option_code = option_code
        self._x = x
    @property
    def code(self):
        return self._option_code
    @code.setter
    def code(self,new_code):
        self._option_code = new_code

    @property
    def x(self):
        return self._x
    @x.setter
    def x(self,new_x):
        self._x = new_x

    def f_x(self):
        if self.code in self.functions:
            return self.functions[self.code](self.x)

    def __add__(self,other):
        sum = self.f_x() + other.f_x()
        return sum

    def __sub__(self,other):
        difference = self.f_x() - other.f_x()
        return difference

    def __mul__(self,other): 
        product = self.f_x() * other.f_x()
        return product

    def __truediv__(self,other):
        quotient = self.f_x() / other.f_x()
        return quotient


#class poly(functionClass)-------------------------------------------------------------------------------------------------------
class poly(functionClass):
    def __init__(self,coeffs,x):
        self.coeffs = coeffs
        print(self.coeffs)
        self.degree = len(coeffs)
        functionClass.x = x


    @property
    def coeffs(self):
        return self.coeffs

    @coeffs.setter
    def coeffs(self,new_coeffs):#TAKES IN A LIST
        self.coeffs= new_coeffs


    #test this
    def p_x(self):
        sum = 0
        for i in range(self.degree):
            sum = sum + (self.coeffs[i] * math.pow(x,i))
        return sum

    def __add__(self,other):
        pass

    def __sub__(self,other):
        pass

    def __mul__(self,other):
        pass

答案 2 :(得分:0)

我觉得现有的答案使这里的事情变得过于复杂。毕竟,您所需要做的就是在名称和组名称之间创建映射-

group_map = {name: f'group{idx+1}' for idx,name in enumerate(set(data['name']))}

group_map
{'bus': 'group1', 'car': 'group2'}

data['group']=data['name'].map(group_map)