如何根据单词对的存在选择子字符串?蟒蛇

时间:2018-04-30 18:28:09

标签: python dataframe

我有大量的句子,我想从中提取以某些单词组合开头的子句子。例如,我想提取以“what do”或“what is”等开头的句段(基本上消除了在单词对之前出现的句子中的单词)。句子和单词对都是存储在DataFrame

'Sentence'                                    'First2'                                    
0  If this is a string what does it say?      0 can I    
1  And this is a string, should it say more?  1 should it    
2  This is yet another string.                2 what does
3  etc. etc.                                  3 etc. etc

我想从上面的例子得到的结果是:

0 what does it say?
1 should it say more?
2

下面最明显的解决方案(至少对我而言)不起作用。它只使用第一个单词对b来覆盖所有句子r,而不是其他b

a = df['Sentence']
b = df['First2'] 

#The function seems to loop over all r's but only over the first b:
def func(z): 
    for x in b:
        if x in r:
            s = z[z.index(x):] 
            return s
        else:
            return ‘’

df['Segments'] = a.apply(func)

似乎以这种方式同时循环两个DataFrame不起作用。有没有更有效率的方法来做到这一点?

3 个答案:

答案 0 :(得分:1)

您可以通过zip(iterator,iterator_foo)

轻松完成两件事

答案 1 :(得分:1)

我相信您的代码中存在错误。

else:
    return ''

这意味着如果第一次比较不匹配,' func'将立即返回。这可能就是代码没有返回任何匹配的原因。

示例工作代码如下:

# The function seems to loop over all r's but only over the first b:
def func(sentence, first_twos=b):
    for first_two in first_twos:
        if first_two in sentence:
            s = sentence[sentence.index(first_two):]
            return s
    return ''

df['Segments'] = a.apply(func)

输出:

df:   
{   
'First2': ['can I', 'should it', 'what does'],   
'Segments': ['what does it say? ', 'should it say more?', ''],   
'Sentence': ['If this is a string what does it say? ', 'And this is a string, should it say more?', 'This is yet another string.  '  ]  
} 

答案 2 :(得分:0)

以下代码回答了我的问题:

def func(r):
    for i in b:
        if i in r:
            q = r[r.index(i):]
            return q
    return ''

df['Segments'] = a.apply(func)

Daming Lu指出了解决方案(只有最后一行与他不同)。问题出在原始代码的最后两行:

else:
    return ''  

这导致函数过早返回。大明路的回答比可能重复的问题python for-loop only executes once?的答案要好,后者产生了其他问题 - 正如我对wii的回应所解释的那样。 (所以我不确定我的确是一个副本。)