将句子转换为单词列表,然后找到根字符串的索引应该做的事情:
sentence = "lack of association between the promoter polymorphism of the mtnr1a gene and adolescent idiopathic scoliosis"
root = "mtnr1a"
try:
words = sentence.split()
n = words.index(root)
cutoff = ' '.join(words[n-4:n+5])
except ValueError:
cutoff = None
print(cutoff)
结果:
promoter polymorphism of the mtnr1a gene and adolescent idiopathic
如何在pandas数据框中使用它?
我试试:
sentence = data['sentence']
root = data['rootword']
def cutOff(sentence,root):
try:
words = sentence.str.split()
n = words.index(root)
cutoff = ' '.join(words[n-4:n+5])
except ValueError:
cutoff = None
return cutoff
data.apply(cutOff(sentence,root),axis=1)
但它不起作用......
编辑:
如果在根词之后的4个字符串后,当根词在句子中的第一个位置时,以及当根词在句子中的最后位置时,如何剪切句子? 例如:
sentence = "mtnr1a lack of association between the promoter polymorphism of the gene and adolescent idiopathic scoliosis"
out if root in first position:
"mtnr1a lack of association between"
out if root in last position:
"lack of association between the promoter polymorphism of the gene and adolescent idiopathic scoliosis"
"adolescent idiopathic scoliosis mtnr1a"
答案 0 :(得分:0)
代码中的两个小调整可以解决您的问题:
首先,在数据框上调用apply()
会将函数应用于调用它的DataFrame的每一行中的值。
您不必将列作为函数的输入传入,并且调用sentence.str.split()
没有意义。 cutOff()
函数sentence
内部只是一个常规字符串(不是列)。
将您的功能更改为:
def cutOff(sentence,root):
try:
words = sentence.split() # this is the line that was changed
n = words.index(root)
cutoff = ' '.join(words[n-4:n+5])
except ValueError:
cutoff = None
return cutoff
接下来,您只需指定将作为功能输入的列 - 您可以使用lambda
执行此操作:
df.apply(lambda x: cutOff(x["sentence"], x["rootword"]), axis=1)
#0 promoter polymorphism of the mtnr1a gene and a...
#dtype: object