从column_python的每一行中删除重复的单词

时间:2020-09-27 20:48:33

标签: python pandas lambda duplicates

我试图创建一个lambda函数,以便从列的每一行中删除重复的单词。 我试图将一个变量定义为我的列,并创建了一个函数来消除句子中的重复单词,但是我不知道如何使用lambda将该函数应用于所有列。

def unique_list(l):
lst= []
[lst.append(x) for x in l if x not in lst]
return lst
a= 'shoes dress apple shoes mango apple'
a=' '.join(unique_list(a.split()))

我的专栏是'dup_words',请您帮助我了解如何使用lambda将上述函数应用于我专栏中的所有行?

2 个答案:

答案 0 :(得分:0)

假设行和列的意思是这样的结构:

data = [
    ['shoes dress apple shoes mango apple','shoes dress apple shoes mango apple','shoes dress apple shoes mango apple'],  # Column 1
    ['shoes dress apple shoes mango apple','shoes dress apple shoes mango apple','shoes dress apple shoes mango apple'],  # Column 2
    ['shoes dress apple shoes mango apple','shoes dress apple shoes mango apple','shoes dress apple shoes mango apple']   # Column 3
]

您可以将列表推导式与包含您定义的函数的 lambda 函数结合使用:

dup_words = data[0]  # ['shoes dress apple shoes mango apple', 'shoes dress apple shoes mango apple', 'shoes dress apple shoes mango apple']

unique_words = [(lambda x: ' '.join(unique_list(x.split())))(row) for row in dup_words]  # ['shoes dress apple mango', 'shoes dress apple mango', 'shoes dress apple mango']

可以通过将您的功能更改为:

def unique_list(l):
    lst = []
    [lst.append(x) for x in l if x not in lst]
    return ' '.join(lst)

那么你的 lambda 函数就变成了

unique_words = [(lambda x: unique_list(x.split()))(row) for row in dup_words]  # ['shoes dress apple mango', 'shoes dress apple mango', 'shoes dress apple mango']

答案 1 :(得分:0)

如果顺序无关紧要,请使用 set()。简短明了。

def unique_list(l):
    return list(set(l))


a = 'shoes dress apple shoes mango apple'
a = ' '.join(unique_list(a.split()))  # 'shoes dress apple mango'

所有人都为简单的单衬纸欢呼:

a = 'shoes dress apple shoes mango apple'
a = ' '.join(list(set(a.split())))  # 'shoes dress apple mango'

您的新列可以写成这样:

df['deduped'] = df['some_column'].apply(lambda x: list(set(x)))
相关问题