我正在研究一个g程序,该程序将检查研究标题中的某些模式以确定标题是否相关。通常,如果单词“ access”和“ care”在四个单词之内,这将是相关的。可能有诸如“获得护理”,“患者获得”或“获得糖尿病护理”之类的短语。
现在,我已经枚举并分割了每个字符串,并且过滤掉了其中包含“访问”和“护理”的行以及一个数字,但是我一直在努力创建一个二进制“是/否”变量,如果它们之间的距离不超过4个字。例如:
string =“确保获得护理很重要。
相关='是'
string =“确保获得棒球票很重要,但老实说,我并不在乎。
相关='否'
任何有关如何解决此问题的想法将不胜感激。到目前为止,这是我所拥有的:
sentence = 'A priority area for this company is access to medical care
and how we address it.'
sentence = sentence.lower()
sentence = sentence.split()
for i, j in enumerate(sentence):
if 'access' in j:
x = 'yes'
else:
x = 'no'
if 'care' in j:
y = 'yes'
else:
y = 'no'
if x == 'yes' or y == 'yes':
print(i, j, x, y)
答案 0 :(得分:2)
轻松地避免所有这些循环:
sentence = 'A priority area for this company is access to medical care and how we address it.'
sentence = sentence.lower().split()
### if both in list
if 'access' in sentence and 'care' in sentence :
### take indexes
access_position = sentence.index('access')
care_position = sentence.index('care')
### check the distance between indexes
if abs( access_position - care_position ) < 4 :
print("found access and care in less than 4 words")
### result:
found access and care in less than 4 words
答案 1 :(得分:1)
您可以找到索引,因此可以使用索引进行检查。 将您的代码修改为:
sentence = 'A priority area for this company is access to medical care and how we address it.'
sentence = sentence.lower()
sentence = sentence.split()
access_index = 0
care_index = 0
for i, j in enumerate(sentence):
if 'access' in j:
access_index= i
if 'care' in j:
care_index = i
if access_index - care_index < 4:
print ("Less than 4 words")
else:
print ("More than 4 words")
答案 2 :(得分:1)
您可以这样做:
access = sentence.index("access")
care = sentence.index("care")
if abs(care - access) <= 4:
print("Less than or equal to 4")
else:
print("More than 4")
当然,请修改以上代码以适合您的特定情况。
答案 3 :(得分:1)
如果句子中出现多次“护理”或“访问”,那么到目前为止所有答案只会考虑其中之一,有时可能无法检测到匹配项。相反,您需要考虑每个单词的所有出现次数:
sentence = "Access to tickets and access to care"
sentence = sentence.lower().split()
access_positions = [i for (i, word) in enumerate(sentence) if word == 'access']
care_positions = [i for (i, word) in enumerate(sentence) if word == 'care']
sentence_is_relevant = any(
abs(access_i - care_i) <= 4
for access_i in access_positions
for care_i in care_positions
)
print("sentence_is_relevant =", sentence_is_relevant)