我有一个带有文本列的数据框'描述' 我有一个搜索字符串列表:
search = ['FR-001', 'FR-002, 'FR-003', 'FR-004']
我想使用搜索列表中的字符串搜索数据框。我用过:
df.loc[df['Description'].str.contains('|'.join(search), na=False)]
我得到了所需的结果,以便正确返回所有行。
如何将每个成功的搜索字符串添加到新数据框列中的匹配行' FR'?
修改
5行描述列,其中包含预期结果列FR
答案 0 :(得分:2)
我认为你需要findall
:
使用@AndreyF的样本数据:
search = ['FR-001', 'FR-002', 'FR-003', 'FR-004']
df['FR'] = df['Description'].str.findall('(' + '|'.join(search) + ')')
print (df)
Description FR
0 AasfasfFR-001,asfasdfafsagsdg FR-002 [FR-001, FR-002]
1 AasfasfFR-004, FR-002 [FR-004, FR-002]
2 AasfasfFR-02,asfasdfafsagsdg []
3 AasfasfFR-001,asfasdfafsagsdg FR-003 [FR-001, FR-003]
4 AasfasfFR-004,asfasdfafsagsdg FR-002 [FR-004, FR-002]
如果需要过滤掉空列表:
df = df[df['FR'].astype(bool)]
print (df)
Description FR
0 AasfasfFR-001,asfasdfafsagsdg FR-002 [FR-001, FR-002]
1 AasfasfFR-004, FR-002 [FR-004, FR-002]
3 AasfasfFR-001,asfasdfafsagsdg FR-003 [FR-001, FR-003]
4 AasfasfFR-004,asfasdfafsagsdg FR-002 [FR-004, FR-002]
答案 1 :(得分:0)
您可以apply
对每个值执行一个函数,并在那里创建所需的字符串:
def find_values(to_search):
ret_val = '('
for val in search:
if to_search.find(val) >= 0:
ret_val += val + ','
return ret_val[:-1] + ')' if len(ret_val) > 1 else ret_val + ')'
df['FR'] = df['Description'].apply(find_values)
对于虚拟的例子:
0 AasfasfFR-001,asfasdfafsagsdg FR-002
1 AasfasfFR-004, FR-002
2 AasfasfFR-02,asfasdfafsagsdg
3 AasfasfFR-001,asfasdfafsagsdg FR-003
4 AasfasfFR-004,asfasdfafsagsdg FR-002
输出将是:
Description FR
0 AasfasfFR-001,asfasdfafsagsdg FR-002 (FR-001,FR-002)
1 AasfasfFR-004, FR-002 (FR-002,FR-004)
2 AasfasfFR-02,asfasdfafsagsdg ()
3 AasfasfFR-001,asfasdfafsagsdg FR-003 (FR-001,FR-003)
4 AasfasfFR-004,asfasdfafsagsdg FR-002 (FR-002,FR-004)