Python-遍历关键字列表并遍历句子以查找关键字与单词“ access”之间的匹配项

时间:2019-06-18 14:49:11

标签: python text indexing enumeration

我有一个关键字列表,我需要知道它们是否在列表中句子中“ access”一词的4个单词之内。最后,我想总计一个关键字与之匹配的次数列表中特定句子的“访问”一词。

当前输出:

[“少数民族”,“患者”,“经常”,“有”,“障碍”,“有”,“他们的”,“获得”,“去”,“医疗保健”。] 0
[“农村”,“患者”,“经常”,“引用”,“距离”,“作为”,“一个”,“障碍”,“到达”,“获取”,“健康”,“服务”。]
[“少数民族”,“患者”,“经常”,“有”,“障碍”,“有”,“他们的”,“获取”,“去”,“医疗”。)0
[“少数群体”,“患者”,“经常”,“有”,“障碍”,“有”,“他们的”,“获得”,“接受”,“医疗”。] 1

所需的输出:

[“少数民族”,“患者”,“经常”,“有”,“障碍”,“有”,“他们的”,“获取”,“去”,“医疗保健”。] 2
[“我是Microsoft,Access,数据库的狂热用户”] 0
[“农村”,“患者”,“经常”,“引用”,“距离”,“作为”,“一个”,“障碍”,“到达”,“访问”,“医疗”,“服务”。] 3

  accessdesc = ["care", "services", "healthcare", "barriers"] 

  sentences = ["Minority patients often have barriers with their access to 
  healthcare.", "I am an avid user of Microsoft Access databases", "Rural 
  patients often cite distance as one of the barriers to access healthcare 
  services."] 

  for sentence in sentences:                     
      nummatches = 0
      for desc in accessdesc:
           sentence = sentence.replace(".","") if "." in sentence else ''
           sentence = sentence.replace(",","") if "," in sentence else ''

           if 'access' in sentence.lower() and desc in sentence.lower():
           sentence = sentence.lower().split()

           access_position = sentence.index('access') if "access" in 
           sentence else 0

           desc_position = sentence.index(desc) if desc in sentence else 0

               if abs(access_position - desc_position) < 5  :

                   nummatches = nummatches + 1

               else:
                   nummatches = nummatches + 0
           print(sentence, nummatches)

1 个答案:

答案 0 :(得分:1)

我认为您需要从以下位置切换循环顺序

for desc in accessdesc:    
    for sentence in sentences: 

收件人:

for sentence in sentences:
    nummatches = 0 # Resets the count to 0 for each sentence
    for desc in accessdesc: 

这意味着您可以在进入下一个句子之前检查句子中的每个单词。然后只需将print(sentence, nummatches)语句移出第二个循环,即可在每个句子后打印结果。

还有其他要看的是行if 'access' and desc in sentence :and会将左边的表达式和右边的表达式组合在一起,并检查它们是否都对True求值。这意味着它正在检查access == TrueTrue还是desc in sentence。您要检查的是access和desc是否都处于感知状态。我也建议忽略此检查的大小写,因为'access'不等于'Access'。因此,您可以重写为此

if 'access' in sentence.lower() and desc in sentence.lower():
    sentence = sentence.lower().split()

现在,由于您正在检查if条件中句子中是否包含desc,因此您不必像注释中提到的那样再次进行检查。

请注意,您的代码只有在访问或关键字之一在句子中出现一次或更少的情况下才可能按预期工作,因为sentence.index()仅会找到字符串的第一个匹配项。它将需要额外的逻辑来处理多次出现的字符串。

编辑

因此,您的行替换了标点符号,例如如果句子中不存在标点符号,则sentence = sentence.replace(".","") if "." in sentence else ''会将句子设置为''。您可以在一行中进行所有替换,然后对照列表而不是句子字符串进行检查。另外,您将要检查拆分列表中是否存在该单词,而不是字符串中存在的单词,以便仅在整个单词上匹配。

'it' in 'bit'
>>> True
'it' in ['bit']
>>> False

因此您可以将代码重写为此:

for sentence in sentences:                     
    nummatches = 0
    words = sentence.replace(".","").replace(",","").lower().split()
    # moved this outside of the second loop as the sentence doesn't change through the iterations
    # Not changing the sentence variable so can print in it's original form
    if 'access' not in words:
        continue # No need to proceed if access not in the sentence
    for desc in accessdesc:
         if desc not in words:
             continue # Can use continue to go to the next iteration of the loop
         access_position = words.index('access')
         desc_position = words.index(desc)

         if abs(access_position - desc_position) < 5  :
             nummatches += 1
             # else statement not required
    print(sentence, nummatches) # moved outside of the second loop so it prints after checking through all the words

如前所述,只有在“访问”或其中一个关键字仅在句子中出现一次或更少时,此方法才有效。如果它们出现不止一次,则使用index()只会找到第一个匹配项。 看一下this answer,看看是否可以在代码中使用解决方案。 另请参阅this answer,了解如何从字符串中删除标点符号。