Question

我有大量的句子，我想从中提取以某些单词组合开头的子句子。例如，我想提取以“what do”或“what is”等开头的句段（基本上消除了在单词对之前出现的句子中的单词）。句子和单词对都是存储在DataFrame：

中

'Sentence'                                    'First2'                                    
0  If this is a string what does it say?      0 can I    
1  And this is a string, should it say more?  1 should it    
2  This is yet another string.                2 what does
3  etc. etc.                                  3 etc. etc

我想从上面的例子得到的结果是：

0 what does it say?
1 should it say more?
2

下面最明显的解决方案（至少对我而言）不起作用。它只使用第一个单词对b来覆盖所有句子r，而不是其他b。

a = df['Sentence']
b = df['First2'] 

#The function seems to loop over all r's but only over the first b:
def func(z): 
    for x in b:
        if x in r:
            s = z[z.index(x):] 
            return s
        else:
            return ‘’

df['Segments'] = a.apply(func)

似乎以这种方式同时循环两个DataFrame不起作用。有没有更有效率的方法来做到这一点？

Answer 1

您可以通过zip(iterator,iterator_foo)

轻松完成两件事

Answer 2

我相信您的代码中存在错误。

else:
    return ''

这意味着如果第一次比较不匹配，＆＃39; func＆＃39;将立即返回。这可能就是代码没有返回任何匹配的原因。

示例工作代码如下：

# The function seems to loop over all r's but only over the first b:
def func(sentence, first_twos=b):
    for first_two in first_twos:
        if first_two in sentence:
            s = sentence[sentence.index(first_two):]
            return s
    return ''

df['Segments'] = a.apply(func)

输出：

df:   
{   
'First2': ['can I', 'should it', 'what does'],   
'Segments': ['what does it say? ', 'should it say more?', ''],   
'Sentence': ['If this is a string what does it say? ', 'And this is a string, should it say more?', 'This is yet another string.  '  ]  
}

Answer 3

以下代码回答了我的问题：

def func(r):
    for i in b:
        if i in r:
            q = r[r.index(i):]
            return q
    return ''

df['Segments'] = a.apply(func)

Daming Lu指出了解决方案（只有最后一行与他不同）。问题出在原始代码的最后两行：

else:
    return ''

这导致函数过早返回。大明路的回答比可能重复的问题python for-loop only executes once?的答案要好，后者产生了其他问题 - 正如我对wii的回应所解释的那样。（所以我不确定我的确是一个副本。）

如何根据单词对的存在选择子字符串？蟒蛇

3 个答案: