美国人口普查局使用一种名为“soundex”的特殊编码来查找有关某人的信息。 soundex是姓氏(姓氏)的编码,基于姓氏的方式而不是拼写的方式。听起来相同,但拼写不同的姓氏,如SMITH和SMYTH,具有相同的代码并一起归档。开发了soundex编码系统,以便您可以找到一个姓氏,即使它可能已被记录在各种拼写中。
在本实验中,您将设计,编码和记录在输入姓氏时生成soundex代码的程序。系统将提示用户输入姓氏,程序应输出相应的代码。
基本Soundex编码规则
姓氏的每个soundex编码都包含一个字母和三个数字。使用的字母始终是姓氏的第一个字母。根据下面显示的soundex指南,将数字分配给姓氏的其余字母。如果需要,最后添加零以始终生成四字符代码。其他信件被忽略。
Soundex编码指南
Soundex为各种辅音分配一个数字。听起来相似的辅音被赋予相同的数字:
号码辅音
1 B,F,P,V 2 C,G,J,K,Q,S,X,Z 3 D,T 4 L 5 M,N 6 R
Soundex忽略字母A,E,I,O,U,H,W和Y.
遵循3个额外的Soundex编码规则。一个好的程序设计会将这些作为一个或多个单独的函数来实现。
规则1.带双字母的姓名
如果姓氏有双字母,则应将其视为一个字母。例如:
Gutierrez编码为G362(G为3,T为6,第一个R为6,忽略第二个R,Z为2)。 规则2.带有相同Soundex代码编号的字母名称
如果姓氏在soundex编码指南中具有相同数字并排的不同字母,则应将其视为一个字母。例子:
Pfister编码为P236(P,F被忽略,因为它被认为与P相同,S为2,T为3,R为6)。
杰克逊被编码为J250(J,2代表C,K忽略与C相同,S忽略与C相同,5代表N,0代表)。
规则3.辅音分隔符
3.A。如果元音(A,E,I,O,U)分离具有相同soundex代码的两个辅音,则对元音右侧的辅音进行编码。例如:
Tymczak被编码为T-522(T为5,M为2,C为2,Z忽略(参见上文“并排”规则),K为2)。由于元音“A”将Z和K分开,因此K被编码。 3.B.如果“H”或“W”将两个具有相同soundex代码的辅音分开,则右边的辅音不被编码。例如:
* Ashcraft编码为A261(A,2为S,C被忽略,因为与S相同,其间为H,R为6,F为1)。它没有编码A226。
到目前为止,这是我的代码:
surname = raw_input("Please enter surname:")
outstring = ""
outstring = outstring + surname[0]
for i in range (1, len(surname)):
nextletter = surname[i]
if nextletter in ['B','F','P','V']:
outstring = outstring + '1'
elif nextletter in ['C','G','J','K','Q','S','X','Z']:
outstring = outstring + '2'
elif nextletter in ['D','T']:
outstring = outstring + '3'
elif nextletter in ['L']:
outstring = outstring + '4'
elif nextletter in ['M','N']:
outstring = outstring + '5'
elif nextletter in ['R']:
outstring = outstring + '6'
print outstring
足以满足它的要求,我只是不确定如何编写这三个规则。这是我需要帮助的地方。所以,任何帮助都表示赞赏。
答案 0 :(得分:1)
我建议您尝试以下方法。
一旦你很好地分解它,它应该变得更容易管理。
答案 1 :(得分:0)
这并不完美(例如,如果输入不以字母开头,它会产生错误的结果),并且它不会将规则实现为可独立测试的函数,因此它不会真正用作回答家庭作业问题。但这就是我实现它的方式:
>>> def soundex_prepare(s):
"""Prepare string for Soundex encoding.
Remove non-alpha characters (and the not-of-interest W/H/Y),
convert to upper case, and remove all runs of repeated letters."""
p = re.compile("[^a-gi-vxz]", re.IGNORECASE)
s = re.sub(p, "", s).upper()
for c in set(s):
s = re.sub(c + "{2,}", c, s)
return s
>>> def soundex_encode(s):
"""Encode a name string using the Soundex algorithm."""
result = s[0].upper()
s = soundex_prepare(s[1:])
letters = 'ABCDEFGIJKLMNOPQRSTUVXZ'
codes = '.123.12.22455.12623.122'
d = dict(zip(letters, codes))
prev_code=""
for c in s:
code = d[c]
if code != "." and code != prev_code:
result += code
if len(result) >= 4: break
prev_code = code
return (result + "0000")[:4]
答案 2 :(得分:0)
surname = input("Enter surname of the author: ") #asks user to input the author's surname
while surname != "": #initiates a while loop thats loops on as long as the input is not equal to an empty line
str_ini = surname[0] #denotes the initial letter of the surname string
mod_str1 = surname[1:] #denotes modified string excluding the first letter of the surname
import re #importing re module to access the sub function
mod_str2 = re.sub(r'[aeiouyhwAEIOUYHW]', '', mod_str1) #eliminating any instances of the given letters
mod_str21 = re.sub(r'[bfpvBFPV]', '1', mod_str2)
mod_str22 = re.sub(r'[cgjkqsxzCGJKQSXZ]', '2', mod_str21)
mod_str23 = re.sub(r'[dtDT]', '3', mod_str22)
mod_str24 = re.sub(r'[lL]', '4', mod_str23)
mod_str25 = re.sub(r'[mnMN]', '5', mod_str24)
mod_str26 = re.sub(r'[rR]', '6', mod_str25)
#substituting given letters with specific numbers as required by the soundex algorithm
mod_str3 = str_ini.upper()+mod_str26 #appending the surname initial with the remaining modified trunk
import itertools #importing itertools module to access the groupby function
mod_str4 = ''.join(char for char, rep in itertools.groupby(mod_str3))
#grouping each character of the string into individual characters
#removing sequences of identical numbers with a single number
#joining the individually grouped characters into a string
mod_str5 = (mod_str4[:4]) #setting character limit of the modified string upto the fourth place
if len (mod_str5) == 1:
print (mod_str5 + "000\n")
elif len (mod_str5) == 2:
print (mod_str5 + "00\n")
elif len (mod_str5) == 3:
print (mod_str5 + "0\n")
else:
print (mod_str5 + "\n")
#using if, elif and else arguments for padding with trailing zeros
print ("Press enter to exit") #specification for the interactor, to press enter (i.e., equivalent to a new line for breaking the while loop) when he wants to exit the program
surname = input("Enter surname of the author: ") #asking next input from the user if he wants to carry on
exit(0) #exiting the program at the break of the while loop