搜索pandas dataframe中的字符串列表,并将每个搜索字符串添加到新列

时间:2018-01-17 07:05:15

标签: pandas dataframe

我有一个带有文本列的数据框'描述' 我有一个搜索字符串列表:

search = ['FR-001', 'FR-002, 'FR-003', 'FR-004']

我想使用搜索列表中的字符串搜索数据框。我用过:

df.loc[df['Description'].str.contains('|'.join(search), na=False)]

我得到了所需的结果,以便正确返回所有行。

如何将每个成功的搜索字符串添加到新数据框列中的匹配行' FR'?

修改

5行描述列,其中包含预期结果列FR

sample dataframe

2 个答案:

答案 0 :(得分:2)

我认为你需要findall

使用@AndreyF的样本数据:

search = ['FR-001', 'FR-002', 'FR-003', 'FR-004']
df['FR'] = df['Description'].str.findall('(' + '|'.join(search) + ')')
print (df)

                            Description                FR
0  AasfasfFR-001,asfasdfafsagsdg FR-002  [FR-001, FR-002]
1                 AasfasfFR-004, FR-002  [FR-004, FR-002]
2         AasfasfFR-02,asfasdfafsagsdg                 []
3  AasfasfFR-001,asfasdfafsagsdg FR-003  [FR-001, FR-003]
4  AasfasfFR-004,asfasdfafsagsdg FR-002  [FR-004, FR-002]

如果需要过滤掉空列表:

df = df[df['FR'].astype(bool)]
print (df)

                            Description                FR
0  AasfasfFR-001,asfasdfafsagsdg FR-002  [FR-001, FR-002]
1                 AasfasfFR-004, FR-002  [FR-004, FR-002]
3  AasfasfFR-001,asfasdfafsagsdg FR-003  [FR-001, FR-003]
4  AasfasfFR-004,asfasdfafsagsdg FR-002  [FR-004, FR-002]

答案 1 :(得分:0)

您可以apply对每个值执行一个函数,并在那里创建所需的字符串:

def find_values(to_search):
    ret_val = '('
    for val in search:
        if to_search.find(val) >= 0:
            ret_val += val + ','
    return ret_val[:-1] + ')' if len(ret_val) > 1 else ret_val + ')'

df['FR'] = df['Description'].apply(find_values)

对于虚拟的例子:

0  AasfasfFR-001,asfasdfafsagsdg FR-002
1                 AasfasfFR-004, FR-002
2         AasfasfFR-02,asfasdfafsagsdg 
3  AasfasfFR-001,asfasdfafsagsdg FR-003
4  AasfasfFR-004,asfasdfafsagsdg FR-002

输出将是:

                            Description               FR
0  AasfasfFR-001,asfasdfafsagsdg FR-002  (FR-001,FR-002)
1                 AasfasfFR-004, FR-002  (FR-002,FR-004)
2         AasfasfFR-02,asfasdfafsagsdg                ()
3  AasfasfFR-001,asfasdfafsagsdg FR-003  (FR-001,FR-003)
4  AasfasfFR-004,asfasdfafsagsdg FR-002  (FR-002,FR-004)