我有一个数百个成员的列表,我想用名字,中间名和姓氏分隔,但有些成员有前缀(用'P'表示)。所有可能的组合:
First Middle Last
P First Middle Last
First P Middle Last
P First p Middle Last
如何将第一个(使用P,如果可用),中间(使用P,如果可用)和Python中的姓氏分开?这就是我提出的,但它并不常用。
import csv
inPath = "input.txt"
outPath = "output.txt"
newlist = []
file = open(inPath, 'rU')
if file:
for line in file:
member = line.split()
newlist.append(member)
file.close()
else:
print "Error Opening File."
file = open(outPath, 'wb')
if file:
for i in range(len(newlist)):
print i, newlist[i][0] # Should get the First Name with Prefix
print i, newlist[i][1] # Should get the Middle Name with Prefix
print i, newlist[i][-1]
file.close()
else:
print "Error Opening File."
我想要的是:
非常感谢你的帮助。
答案 0 :(得分:2)
这个完整的测试脚本怎么样:
import sys
def process(file):
for line in file:
arr = line.split()
if not arr:
continue
last = arr.pop()
n = len(arr)
if n == 4:
first, middle = ' '.join(arr[:2]), ' '.join(arr[2:])
elif n == 3:
if arr[0] in ('M', 'Shk', 'BS'):
first, middle = ' '.join(arr[:2]), arr[-1]
else:
first, middle = arr[0], ' '.join(arr[1:])
elif n == 2:
first, middle = arr
else:
continue
print 'First: %r' % first
print 'Middle: %r' % middle
print 'Last: %r' % last
if __name__ == '__main__':
process(sys.stdin)
如果在Linux上运行此命令,请键入示例行,然后按Ctrl + D表示输入结束。在Windows上,使用Ctrl + Z而不是Ctrl + D.当然,您也可以在文件中进行管道传输。
以下输入文件:
First Middle Last
M First Middle Last
First Shk Middle Last
BS First M Middle Last
给出了这个输出:
First: 'First'
Middle: 'Middle'
Last: 'Last'
First: 'M First'
Middle: 'Middle'
Last: 'Last'
First: 'First'
Middle: 'Shk Middle'
Last: 'Last'
First: 'BS First'
Middle: 'M Middle'
Last: 'Last'
答案 1 :(得分:1)
names = [('A', 'John', 'Paul', 'Smith'),
('Matthew', 'M', 'Phil', 'Bond'),
('A', 'Morris', 'O', 'Reil', 'M', 'Big')]
def getItem():
for name in names:
for (pos,item) in enumerate(name):
yield item
itembase = getItem()
for i in enumerate(names):
element = itembase.next()
if len(element) == 1: firstName = element+" "+itembase.next()
else: firstName = element
element = itembase.next()
if len(element) == 1: mName = element+" "+itembase.next()
else: mName = element
element = itembase.next()
if len(element) == 1: lastName = element+" "+itembase.next()
else: lastName = element
print "First Name: "+firstName
print "Middle Name: "+mName
print "Last Name: "+lastName
print "--"
这似乎有效。替换len(element) == 1
条件(我不知道你只需要检查3个,所以我用任何单个字母做了一个),条件是你有三个前缀。
**Output**
First Name: A John
Middle Name: Paul
Last Name: Smith
First Name: Matthew
Middle Name: M Phil
Last Name: Bond
First Name: A Morris
Middle Name: O Reil
Last Name: M Big
答案 2 :(得分:1)
在这里,您将以面向对象的方式:
class Name(object):
def __init__(self, fullname):
self.full = fullname
s = self.full.split()
try:
self.first = " ".join(s[:2]) if len(s[0]) == 1 else s[0]
s = s[len(self.first.split()):]
self.middle = " ".join(s[:2]) if len(s[0]) == 1 else s[0]
s = s[len(self.middle.split()):]
self.last = " ".join(s[:2]) if len(s[0]) == 1 else s[0]
finally:
pass
names = [
"First Middle Last",
"P First Middle Last",
"First P Middle Last",
"P First p Middle Last",
]
for fullname in names:
name = Name(fullname)
print (name.first, name.middle, name.last)
答案 3 :(得分:1)
如果'M','Shk'和'BS'不是有效的姓名/姓氏,即您不关心他们的确切位置,您可以使用单行过滤掉它们:
first, middle, last = filter(lambda x: x not in ('M','Shk','BS'), yourNameHere.split())
当然,yourNameHere
是包含要解析的名称的字符串。
警告:对于这段代码,我假设你总是有一个中间名,正如你在上面的例子中所指定的那样。如果没有,您必须获取整个列表并计算元素以了解您是否具有中间名。
编辑:如果你关心前缀位置:
first, middle, last = map(
lambda x: x[1],
filter(
lambda (i,x): i not in (0, 2) or x not in ('M','Shk','BS'),
enumerate(yourNameHere.split())))
答案 4 :(得分:0)
import csv
class CsvWriter(object):
"""
Wraps csv.writer in a partial file-API compatibility layer
"""
def __init__(self, fname, mode='w', *args, **kwargs):
super(CsvWriter, self).__init__()
self.f = open(fname, mode)
self.writer = csv.writer(self.f, *args, **kwargs)
def write(self, *args):
"""
Writes a row of data to the csv file
Can be called as
.write() puts a blank row
.write(2) puts a single cell
.write([1,2,3]) puts 3 cells
.write(1,2,3) puts 3 cells
"""
if len(args)==1 and hasattr(args[0], ('__iter__')):
# single argument, and it's a sequence - let it be the row data
rowdata = args[0]
else:
rowdata = args
self.writer.writerow(rowdata)
def close(self):
self.writer = None
self.f.close()
def __enter__(self):
return self
def __exit__(self, *exc):
self.close()
class NameSplitter(object):
def __init__(self, pre=None):
super(NameSplitter, self).__init__()
# list of accepted prefixes
if pre is None:
self.pre = set(['m','shk','bs'])
else:
self.pre = set([s.lower() for s in pre])
# is-a-prefix word tester
self.isPre = lambda x,p=self.pre: x.lower() in p
jn = lambda *args: ' '.join(*args)
# signature-based dispatch table
self.match = {}
self.match[(3,())] = lambda w,j=jn: (w[0], w[1], w[2])
self.match[(4,(0,))] = lambda w,j=jn: (j(w[0],w[1]), w[2], w[3])
self.match[(4,(1,))] = lambda w,j=jn: (w[0], j(w[1],w[2]), w[3])
self.match[(5,(0,2))] = lambda w,j=jn: (j(w[0],w[1]), j(w[2],w[3]), w[4])
def __call__(self, nameStr):
words = nameStr.split()
# build hashable signature
pres = tuple(n for n,word in enumerate(words) if self.isPre(word))
sig = (len(words), pres)
try:
do = self.match[sig]
return do(words)
except KeyError:
return None
def process(inf, outf, fn):
for line in inf:
res = fn(line)
if res is not None:
outf.write(res)
def main():
infname = "input.txt"
outfname = "output.csv"
with open(infname,'rU') as inf:
with CsvWriter(outfname) as outf:
process(inf, outf, NameSplitter())
if __name__=="__main__":
main()
答案 5 :(得分:0)
完整的脚本:
import sys
def f(a,b):
if b in ('M','Shk','BS'):
return '%s %s' % (b,a)
else:
return '%s,%s' % (b,a)
for line in sys.stdin:
sys.stdout.write(reduce(f, reversed(line.split(' '))))
输入:
First Middle Last
M First Middle Last
First Shk Middle Last
BS First M Middle Last
CSV输出:
First,Middle,Last
M First,Middle,Last
First,Shk Middle,Last
BS First,M Middle,Last
答案 6 :(得分:-1)
这是另一种解决方案(通过更改有问题的源代码获得):
import csv
inPath = "input.txt"
outPath = "output.txt"
newlist = []
file = open(inPath, 'rU')
if file:
for line in file:
member = line.split()
newlist.append(member)
file.close()
else:
print "Error Opening File."
file = open(outPath, 'wb')
if file:
for fullName in newlist:
prefix = ""
for name in fullName:
if name == "P" or name == "p":
prefix = name + " "
continue
print prefix+name
prefix = ""
print
file.close()
else:
print "Error Opening File."
答案 7 :(得分:-2)
我会使用正则表达式,特别是为此目的设计的。 这个解决方案很容易维护和理解。
值得尝试。 http://docs.python.org/library/re.html
import re
from operator import truth
// patterns
//First Middle Last
first = re.compile ("^([\w]+) +([\w]+) ([\w]+)$")
//P First Middle Last
second = re.compile ("^(M|Shk|BS) +([\w]+) +([\w]+) ([\w]+)$")
//First P Middle Last
third = re.compile ("^([\w]+) +(M|Shk|BS) +([\w]+) ([\w]+)$")
//P First p Middle Last
forth = re.compile ("^(M|Shk|BS) +([\w]+) +(M|Shk|BS) +([\w]+) ([\w]+)$")
if truth (first.search (you_string)):
parsed = first.search (you_string)
print parsed.group(1), parsed.group(2), parsed.group(3)
elif truth (second.search (you_string)):
parsed = first.search (you_string)
print parsed.group(1), parsed.group(2), parsed.group(3)
elif truth (third.search (you_string)):
parsed = first.search (you_string)
print parsed.group(1), parsed.group(2), parsed.group(3)
elif truth (forth.search (you_string)):
parsed = first.search (you_string)
print parsed.group(1), parsed.group(2), parsed.group(3)
else:
print "not match at all"
由于预编译模式,它会执行得更快