我将输入一个大的strs数据集来与带有列表的dict进行比较。例如,str' phd'将与此词典中的strs进行比较
edu_options = {'Completed College' : [ 'bachelor', 'ba', 'be', 'bs'....],
'Grad School' : ['phd','doctor'...] }
输入str来自edu_dict
edu_dict = {
"A.S":"Attended Vocational/Technical",
"AS":"Attended Vocational/Technical",
"AS,":"Attended Vocational/Technical",
"ASS,":"Attended Vocational/Technical",
"Associate":"Attended Vocational/Technical",
"Associate of Arts (A.A.),":"Attended Vocational/Technical",
"Associate of Arts and Sciences (AAS)":"Attended Vocational/Technical",
"B-Arch":"Completed College",
"B-Tech":"Attended Vocational/Technical",
"B.A. B.S":"Completed College",
"B.A.,":"Completed College",
"B.Arch,":"Completed College",
"B.S":"Completed College",
"B.S.":"Completed College",
"B.S. in Management":"Completed College",
"B.S.,":"Completed College",
"BA":"Completed College",...
*The list is 169 items similar to this*
}
clean_edu()从edu_dict获取密钥,删除标点符号,空格等。例如' P.H.D。'成为了#d;'。如果' phd'匹配来自这些列表中的任何一个的str,它应该返回正确的密钥,在这种情况下,' Completed Graduate'。对于我输入的大多数输入,返回了正确的值。
def clean_edu(edu_entry):
lower_case_key = edu_entry.lower() # changing the key to lower case
chars_in = "-.,')(" #setting the chars to be translated
chars_out = " "
char_change = "".maketrans(chars_in, chars_out) # replacing punctuation(char_in) with empty space(char_out)
clean = lower_case_key.translate(char_change) #executing char_change
cleaned_string = re.sub(r'\s\s{0,}','',clean).strip()
return cleaned_string
while user == "":
for edu_level in edu_options:
for option in edu_options[edu_level]:
if option in cleaned_string:
user = edu_level
return user
user = "No match"
问题在于'''某些输入正确触发但其他输入没有触发。当我打印不匹配的str及其比较时
print ("Not Detected. Adding to txt" + '\t' + edu_entry + '\t' + cleaned_string + '\t' + option)
Output: " Not Detected. Adding to txt business nursing
其中bs是输入,l是比较str。在edu_options dict中没有价值' l'所以我不明白这是从哪里来的。这个问题并没有发生在诸如“生物学”这样的输入序列中。或者' bs business'。
成功运行:
输入str:' P.H.D'输出:'完成研究生院'
答案 0 :(得分:1)
我不确定当你在列表中找到匹配项时,我是否理解你应该返回什么,可能是该列表的关键?
在这种情况下,这应该有效:
>>> edu_options = {'Completed College' : [ 'bachelor', 'ba', 'be', 'bs'], 'Grad Shool': ['phd', 'doctor']}
>>> cleaned_string = 'phd'
>>> for key, value in edu_options.items():
... if cleaned_string in value: # value is the list
... print key # inside a function, use return
...
>>> Grad Shool
编辑:我认为错误在你的第二个循环中,看看会发生什么:
>>> edu_options = {'Completed College' : [ 'bachelor', 'ba', 'be', 'bs'], 'Grad Shool': ['phd', 'doctor']}
>>> for edu_level in edu_options:
... for option in edu_level: # Right here
... print option
...
C
o
m
p
l
e
t
e
d
C
o
l
l
e
g
e
G
r
a
d
S
h
o
o
l
>>>
从那里出来。