我有大量的句子,我想从中提取以某些单词组合开头的子句子。例如,我想提取以“what do”或“what is”等开头的句段(基本上消除了在单词对之前出现的句子中的单词)。句子和单词对都是存储在DataFrame
:
'Sentence' 'First2'
0 If this is a string what does it say? 0 can I
1 And this is a string, should it say more? 1 should it
2 This is yet another string. 2 what does
3 etc. etc. 3 etc. etc
我想从上面的例子得到的结果是:
0 what does it say?
1 should it say more?
2
下面最明显的解决方案(至少对我而言)不起作用。它只使用第一个单词对b
来覆盖所有句子r
,而不是其他b
。
a = df['Sentence']
b = df['First2']
#The function seems to loop over all r's but only over the first b:
def func(z):
for x in b:
if x in r:
s = z[z.index(x):]
return s
else:
return ‘’
df['Segments'] = a.apply(func)
似乎以这种方式同时循环两个DataFrame不起作用。有没有更有效率的方法来做到这一点?
答案 0 :(得分:1)
您可以通过zip(iterator,iterator_foo)
答案 1 :(得分:1)
我相信您的代码中存在错误。
else:
return ''
这意味着如果第一次比较不匹配,' func'将立即返回。这可能就是代码没有返回任何匹配的原因。
示例工作代码如下:
# The function seems to loop over all r's but only over the first b:
def func(sentence, first_twos=b):
for first_two in first_twos:
if first_two in sentence:
s = sentence[sentence.index(first_two):]
return s
return ''
df['Segments'] = a.apply(func)
输出:
df:
{
'First2': ['can I', 'should it', 'what does'],
'Segments': ['what does it say? ', 'should it say more?', ''],
'Sentence': ['If this is a string what does it say? ', 'And this is a string, should it say more?', 'This is yet another string. ' ]
}
答案 2 :(得分:0)
以下代码回答了我的问题:
def func(r):
for i in b:
if i in r:
q = r[r.index(i):]
return q
return ''
df['Segments'] = a.apply(func)
Daming Lu指出了解决方案(只有最后一行与他不同)。问题出在原始代码的最后两行:
else:
return ''
这导致函数过早返回。大明路的回答比可能重复的问题python for-loop only executes once?的答案要好,后者产生了其他问题 - 正如我对wii的回应所解释的那样。 (所以我不确定我的确是一个副本。)