Question

将句子转换为单词列表，然后找到根字符串的索引应该做的事情：

sentence = "lack of association between the promoter polymorphism of the mtnr1a gene and adolescent idiopathic scoliosis"
root = "mtnr1a"

try:
    words = sentence.split()
    n = words.index(root)
    cutoff = ' '.join(words[n-4:n+5])
except ValueError:
    cutoff = None

print(cutoff)

结果：

promoter polymorphism of the mtnr1a gene and adolescent idiopathic

如何在pandas数据框中使用它？

我试试：

sentence = data['sentence'] 
root = data['rootword'] 
def cutOff(sentence,root): 
   try: 
      words = sentence.str.split() 
      n = words.index(root) 
      cutoff = ' '.join(words[n-4:n+5]) 
except ValueError: 
      cutoff = None 
      return cutoff 
data.apply(cutOff(sentence,root),axis=1)

但它不起作用......

编辑：

如果在根词之后的4个字符串后，当根词在句子中的第一个位置时，以及当根词在句子中的最后位置时，如何剪切句子？例如：

sentence = "mtnr1a lack of association between the promoter polymorphism of the gene and adolescent idiopathic scoliosis"
out if root in first position:
"mtnr1a lack of association between"
out if root in last position:
"lack of association between the promoter polymorphism of the gene and adolescent idiopathic scoliosis"
"adolescent idiopathic scoliosis mtnr1a"

Answer 1

代码中的两个小调整可以解决您的问题：

首先，在数据框上调用apply()会将函数应用于调用它的DataFrame的每一行中的值。

您不必将列作为函数的输入传入，并且调用sentence.str.split()没有意义。 cutOff()函数sentence内部只是一个常规字符串（不是列）。

将您的功能更改为：

def cutOff(sentence,root): 
    try: 
        words = sentence.split()  # this is the line that was changed
        n = words.index(root) 
        cutoff = ' '.join(words[n-4:n+5]) 
    except ValueError: 
        cutoff = None 
    return cutoff

接下来，您只需指定将作为功能输入的列 - 您可以使用lambda执行此操作：

df.apply(lambda x: cutOff(x["sentence"], x["rootword"]), axis=1)
#0    promoter polymorphism of the mtnr1a gene and a...
#dtype: object

如何使用熊猫来截断句子的左右部分

1 个答案: