您能帮我解决一个问题吗?我有一个像
这样的数据框import pandas as pd
df = pd.DataFrame(
data=[
['one',12],
['two two',2],
['three one',4],
['four two',1],
['number "five"',9],
['red',1],
['extra sample',1],
['yellow red',1],
['hard',4],
['soft hard',2],
['simple',3],
['sample' ,4],
['diff sample',1]
],
columns=['object_name', 'amount']
)
print(df)
object_name amount
0 one 12
1 two two 2
2 three one 4
3 four two 1
4 number "five" 9
5 red 1
6 extra sample 1
7 yellow red 1
8 hard 4
9 soft hard 2
10 simple 3
11 sample 4
12 diff sample 1
我需要替换原始文件1&3、2&4等中的重复项。我通过这种方法进行处理:
def simple_func(name):
if 'two' in name:
return 'two'
else:
return name
df['object_name'] = df['object_name'].apply(simple_func)
print(df)
object_name amount
0 one 12
1 two 2
2 three one 4
3 two 1
4 number "five" 9
5 red 1
6 extra sample 1
7 yellow red 1
8 hard 4
9 soft hard 2
10 simple 3
11 sample 4
12 diff sample 1
但是问题是我有很多这样的重复项,有些键有多个值。我想用字典代替它们。我做了这样的字典
some_dict = {'numbers':['one','two','five'], 'colors':'red', 'sample':'sample'}
我已经创建了这样的功能
def some_func(name):
for key in some_dict:
if type(some_dict[key]) is list:
for value in some_dict[key]:
if value in name:
return key
else:
return name
else:
if some_dict[key] in name:
return key
else:
return name
当我尝试使用它时
df['object_name'] = df['object_name'].apply(some_func)
仅替换firs键的第一个值。
print(df)
object_name amount
0 numbers 12
1 two 2
2 numbers 4
3 two 1
4 number "five" 9
5 red 1
6 extra sample 1
7 yellow red 1
8 hard 4
9 soft hard 2
10 simple 3
11 sample 4
12 diff sample 1
结果,我想得到这样的东西
object_name amount
0 number 12
1 number 2
2 number 4
3 number 1
4 number 9
5 colors 1
6 sample 1
7 colors 1
8 hard 4
9 soft hard 2
10 simple 3
11 sample 4
12 sample 1
您能指出我的错误吗? 感谢您的帮助!
答案 0 :(得分:1)
想法是删除其他陈述,并添加return name
以结束获取原始值(如果在字典中不匹配):
def some_func(name):
for k, v in some_dict.items():
if isinstance(v, list):
for value in v:
if value in name:
return k
else:
if v in name:
return k
return name
df['object_name'] = df['object_name'].apply(some_func)
print (df)
object_name amount
0 numbers 12
1 numbers 2
2 numbers 4
3 numbers 1
4 numbers 9
5 colors 1
6 sample 1
7 colors 1
8 hard 4
9 soft hard 2
10 simple 3
11 sample 4
12 sample 1
您的功能应更改:
def some_func(name):
for key in some_dict:
if type(some_dict[key]) is list:
for value in some_dict[key]:
if value in name:
return key
else:
if some_dict[key] in name:
return key
return name
答案 1 :(得分:1)
我认为您也可以使用Series.str.contains
for y,x in some_dict.items():
if isinstance(x,list):
for val in x:
df.loc[df['object_name'].str.contains(val),'object_name']=y
else:
df.loc[df['object_name'].str.contains(x),'object_name']=y
print(df)
object_name amount
0 numbers 12
1 numbers 2
2 numbers 4
3 numbers 1
4 numbers 9
5 colors 1
6 sample 1
7 colors 1
8 hard 4
9 soft hard 2
10 simple 3
11 sample 4
12 sample 1