如何用熊猫中的字典替换df行中的重复项

时间:2019-11-07 11:06:42

标签: python pandas data-analysis

您能帮我解决一个问题吗?我有一个像

这样的数据框
import pandas as pd
df = pd.DataFrame(
    data=[
        ['one',12],
        ['two two',2],
        ['three one',4],
        ['four two',1],
        ['number "five"',9],
        ['red',1],
        ['extra sample',1],
        ['yellow red',1],
        ['hard',4],
        ['soft hard',2],
        ['simple',3],
        ['sample' ,4],
        ['diff sample',1]
    ],
    columns=['object_name', 'amount']
)
print(df)
   object_name     amount
0   one            12
1   two two        2
2   three one      4
3   four two       1
4   number "five"  9
5   red            1
6   extra sample   1
7   yellow red     1
8   hard           4
9   soft hard      2
10  simple         3
11  sample         4
12  diff sample    1

我需要替换原始文件1&3、2&4等中的重复项。我通过这种方法进行处理:

def simple_func(name):
    if 'two' in name:
        return 'two'
    else:
        return name
df['object_name'] = df['object_name'].apply(simple_func)
print(df)
    object_name     amount
0   one             12
1   two             2
2   three one       4
3   two             1
4   number "five"   9
5   red             1
6   extra sample    1
7   yellow red      1
8   hard            4
9   soft hard       2
10  simple          3
11  sample          4
12  diff sample     1

但是问题是我有很多这样的重复项,有些键有多个值。我想用字典代替它们。我做了这样的字典

some_dict = {'numbers':['one','two','five'], 'colors':'red', 'sample':'sample'}

我已经创建了这样的功能

def some_func(name):
    for key in some_dict:
        if type(some_dict[key]) is list:
            for value in some_dict[key]:
                if value in name:
                    return key
                else:
                    return name
        else:
            if some_dict[key] in name:
                    return key
            else:
                    return name

当我尝试使用它时

df['object_name'] = df['object_name'].apply(some_func)

仅替换firs键的第一个值。

print(df)
    object_name     amount
0   numbers         12
1   two             2
2   numbers         4
3   two             1
4   number "five"   9
5   red             1
6   extra sample    1
7   yellow red      1
8   hard            4
9   soft hard       2
10  simple          3
11  sample          4
12  diff sample     1

结果,我想得到这样的东西

object_name amount
0   number  12
1   number  2
2   number  4
3   number  1
4   number  9
5   colors  1
6   sample  1
7   colors  1
8   hard    4
9   soft hard   2
10  simple  3
11  sample  4
12  sample  1

您能指出我的错误吗? 感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

想法是删除其他陈述,并添加return name以结束获取原始值(如果在字典中不匹配):

def some_func(name):
    for k, v in some_dict.items():
        if isinstance(v, list):
            for value in v:
                if value in name:
                    return k
        else:
            if v in name:
                return k
    return name

df['object_name'] = df['object_name'].apply(some_func)
print (df)
   object_name  amount
0      numbers      12
1      numbers       2
2      numbers       4
3      numbers       1
4      numbers       9
5       colors       1
6       sample       1
7       colors       1
8         hard       4
9    soft hard       2
10      simple       3
11      sample       4
12      sample       1

您的功能应更改:

def some_func(name):
    for key in some_dict:
        if type(some_dict[key]) is list:
            for value in some_dict[key]:
                if value in name:
                    return key

        else:
            if some_dict[key] in name:
                    return key
    return name

答案 1 :(得分:1)

我认为您也可以使用Series.str.contains

for y,x in some_dict.items():
    if isinstance(x,list):
        for val in x:
            df.loc[df['object_name'].str.contains(val),'object_name']=y
    else:
           df.loc[df['object_name'].str.contains(x),'object_name']=y

print(df)

   object_name  amount
0      numbers      12
1      numbers       2
2      numbers       4
3      numbers       1
4      numbers       9
5       colors       1
6       sample       1
7       colors       1
8         hard       4
9    soft hard       2
10      simple       3
11      sample       4
12      sample       1