我想搜索searchList
并检查每个text
中的一个或多个列str.contains
searchWord
。如果我得到一个匹配,我想将数据附加到masterdf
,这很容易实现,如下所示。但我还想添加一个包含searchWord
的新列,以便我知道哪个text
与哪个匹配。下面的代码使用匹配的最新搜索填充列searchWord
。
masterdf = pd.DataFrame(columns=['doc_id','text',])
for searchWord in searchList:
search = jsons_data[jsons_data['text'].str.contains(searchWord)]
if len(search) > 0:
masterdf = masterdf.append(search)
masterdf['searchWord'] = searchWord
答案 0 :(得分:1)
我认为这就是你所追求的目标。
让我们设置示例数据:
tt = '''I want to search through the. searchList and check if column text str.contains one or more of each searchWord. If I get a match I want to append the data to masterdf which is easily accomplished as seen below. But I also want to add a new column with searchWord so that I know which text matched with what. This code below fills the column searchWord with the. latest search that matched'''
text_col = tt.split('.')
id_col = range(len(text_col))
jsons_data = pd.DataFrame({'doc_id':id_col,'text':text_col})
searchList = ['code','fills', 'But','also','want']
示例jsons_data
是
doc_id text
0 0 I want to search through the
1 1 searchList and check if column text str
2 2 contains one or more of each searchWord
3 3 If I get a match I want to append the data to...
4 4 But I also want to add a new column with sear...
5 5 This code below fills the column searchWord w...
6 6 latest search that matched
使用search['searchWord'] = searchWord
修改代码:
masterdf = pd.DataFrame(columns=['doc_id','text','searchWord'])
for searchWord in searchList:
search = jsons_data[jsons_data['text'].str.contains(searchWord)]
if len(search) > 0:
search['searchWord'] = searchWord
masterdf = masterdf.append(search)
masterdf
是
doc_id text searchWord
5 5.0 This code below fills the column searchWord w... code
5 5.0 This code below fills the column searchWord w... fills
4 4.0 But I also want to add a new column with sear... But
4 4.0 But I also want to add a new column with sear... also
0 0.0 I want to search through the want
3 3.0 If I get a match I want to append the data to... want
4 4.0 But I also want to add a new column with sear... want
答案 1 :(得分:1)
我建议使用矢量化(无循环)方法:
In [84]: df
Out[84]:
doc_id text
0 0 I want to search through the
1 1 searchList and check if column text str
2 2 contains one or more of each searchWord
3 3 If I get a match I want to append the data to masterdf which is easily accomplished as seen below
4 4 But I also want to add a new column with searchWord so that I know which text matched with what
5 5 This code below fills the column searchWord with the
6 6 latest search that matched
In [85]: searchList = ['code', 'fills', 'but', 'also', 'want']
In [86]: words_re = '{}'.format('|'.join(searchList).lower())
In [87]: words_re
Out[87]: 'code|fills|but|also|want'
In [88]: masterdf = df[df.text.str.contains('(?:{})'.format(words_re))].copy()
In [89]: masterdf['searchWord'] = masterdf.text.str.findall('({})'.format(words_re)).str.join('|')
In [90]: masterdf
Out[90]:
doc_id text searchWord
0 0 I want to search through the want
3 3 If I get a match I want to append the data to masterdf which is easily accomplished as seen below want
4 4 But I also want to add a new column with searchWord so that I know which text matched with what also|want
5 5 This code below fills the column searchWord with the code|fills