我刚开始使用Pandas,我正努力添加一个简单的列,该列添加字符串组+列上唯一值的出现次数。
我尝试使用groupby,但是我不知道如何根据列名添加数字
import pandas as pd
data = pd.read_csv('./data.csv')
data['group'] = data.groupby('name') # ???
name color
0 car white
1 car black
2 car red
3 bus white
4 bus black
5 bus red
它应该看起来像这样
name color group
0 car white group1
1 car black group1
2 car red group1
3 bus white group2
4 bus black group2
5 bus red group2
答案 0 :(得分:4)
在此处使用factorize()
df=df.assign(group=(pd.factorize(df.name)[0]+1))
name color group
0 car white 1
1 car black 1
2 car red 1
3 bus white 2
4 bus black 2
5 bus red 2
答案 1 :(得分:2)
具有pandas.core.groupby.GroupBy.ngroup
功能:
import math
import operator
class functionClass:
functions = {0: math.sin, 1: math.cos, 2: math.tan, 3: math.exp, 4: 'identity'}
def __init__(self,option_code=0,x=0):
self._option_code = option_code
self._x = x
@property
def code(self):
return self._option_code
@code.setter
def code(self,new_code):
self._option_code = new_code
@property
def x(self):
return self._x
@x.setter
def x(self,new_x):
self._x = new_x
def f_x(self):
if self.code in self.functions:
return self.functions[self.code](self.x)
def __add__(self,other):
sum = self.f_x() + other.f_x()
return sum
def __sub__(self,other):
difference = self.f_x() - other.f_x()
return difference
def __mul__(self,other):
product = self.f_x() * other.f_x()
return product
def __truediv__(self,other):
quotient = self.f_x() / other.f_x()
return quotient
#class poly(functionClass)-------------------------------------------------------------------------------------------------------
class poly(functionClass):
def __init__(self,coeffs,x):
self.coeffs = coeffs
print(self.coeffs)
self.degree = len(coeffs)
functionClass.x = x
@property
def coeffs(self):
return self.coeffs
@coeffs.setter
def coeffs(self,new_coeffs):#TAKES IN A LIST
self.coeffs= new_coeffs
#test this
def p_x(self):
sum = 0
for i in range(self.degree):
sum = sum + (self.coeffs[i] * math.pow(x,i))
return sum
def __add__(self,other):
pass
def __sub__(self,other):
pass
def __mul__(self,other):
pass
答案 2 :(得分:0)
我觉得现有的答案使这里的事情变得过于复杂。毕竟,您所需要做的就是在名称和组名称之间创建映射-
group_map = {name: f'group{idx+1}' for idx,name in enumerate(set(data['name']))}
group_map
{'bus': 'group1', 'car': 'group2'}
data['group']=data['name'].map(group_map)