检查列中是否存在子字符串

时间:2021-03-09 07:36:34

标签: python pandas dataframe

我需要一些帮助来解决使用 python 和 Pandas 处理数据框的问题。

如果“full_data”中存在“data”的任何子集,我有 2 列,即“data”和“full_data”,那么我需要名为“new_finding”的新列中匹配的子集值

我需要一个新列“new_finding”的输出:

<头>
数据 full_data 新发现
123456 123456789 [123456]
345643 456432345876 [456,345,43]

1 个答案:

答案 0 :(得分:1)

看看这是否适合你

import re
from itertools import permutations

def combs(letters):
    for n in range(1, len(letters)+1):
        yield from map(''.join, permutations(letters, n))
df['new_finding'] = df.apply(lambda x: ([re.findall(comb,str(x['full_data'])) for comb in combs(str(x['data']))]),axis=1)
df['new_finding'] = df['new_finding'].apply(lambda row:[x for x in row if x != []])
df['new_finding'] = df['new_finding'].apply(lambda row:[list(x) for x in set(tuple(x) for x in row)])
df['new_finding'] = df['new_finding'].apply(lambda row:[item[0] for item in row])
df

输出

data    full_data   new_finding
123456  123456789   [45, 1234, 6, 23, 123456, 4, 123, 3456, 12, 5, 3, 12345, 23456, 1, 56, 2345, 234, 345, 2, 34, 456]
345643  456432345876    [345, 5, 564, 45, 45643, 6, 4, 34, 643, 43, 56, 4564, 5643, 456, 3, 64]