Question

我正在尝试创建一个系统，该系统将采用名称作为输入的多行字符串，并将这些行输出为带有名字和姓氏的2d列表。我的问题是，名字和姓氏以及名字都可以作为输入。这可能令人困惑，所以我在下面有一个例子。

这是在Python 3.6中。

我有一个名字列表：

Bob
Steve
Ted
Blake
Harry
Edric
Tommy
Bartholomew

和一个姓氏列表：

Fischer
Stinson
McCord
Bone
Harvey

输入

"""Bob Fischer Steve Ted Stinson Blake Harry McCord
Edric Bone Tommy Harvey Bartholomew"""

输出

[["Bob Fischer","Steve","Ted Stinson","Blake","Harry McCord"],
["Edric Bone","Tommy Harvey","Bartholomew"]]

我真的很难区分名称集合（Steve Ted）与第一和姓氏之间的空格。

任何人都可以帮忙吗？我真的卡住了......

Answer 1

您似乎希望匹配可选地后跟空格和姓氏的名字。

您可以从您拥有的名称列表中创建单个正则表达式模式，并使用re.findall查找所有不重叠的事件：

import re
first = ['Bob','Steve','Ted','Blake','Harry','Edric','Tommy','Bartholomew']
surnames = ['Fischer','Stinson','McCord','Bone','Harvey']
r = r"\b(?:{})\b(?:\s+(?:{})\b)?".format("|".join(first),"|".join(surnames))
s = """Bob Fischer Steve Ted Stinson Blake Harry McCord
Edric Bone Tommy Harvey Bartholomew"""
print(re.findall(r, s))
# => ['Bob Fischer', 'Steve', 'Ted Stinson', 'Blake', 'Harry McCord', 'Edric Bone', 'Tommy Harvey', 'Bartholomew']

请参阅Python demo

regex that is generated with this code：

\b(?:Bob|Steve|Ted|Blake|Harry|Edric|Tommy|Bartholomew)\b(?:\s+(?:Fischer|Stinson|McCord|Bone|Harvey)\b)?

基本上，\b(?:...)\b(?:\s+(?:...)\b)?将替代品中的名字与整个词匹配（由于第一个\b分组构造周围的(?:...)）然后{ {1}}匹配1或0次出现（由于(?:\s+(?:...)\b)?量词）1+空格（?）后跟任何姓氏（同样，由于尾随{{1 }}）。

Answer 2

试试这个，我使用了（而不是姓氏和名字）一个名词及其所属的类别。

A = [ 'Beaver' , 'Strawberry']
B = [ 'Animal' , 'Fruit']

input_string = 'Beaver Animal Strawberry Strawberry Fruit'
input_string = input_string.split(' ')

def combinestring( x_string ):
    compiling_string = []

    for i,x in enumerate(x_string):

        if (i+1) < len(x_string):
            if x in A and x_string[i+1] in B:
                compiling_string.append(x + ' ' + x_string[i+1])
            elif x in A:
                compiling_string.append(x)

        elif (i+1) == len(x_string) and x in A:
            compiling_string.append(x)

    return compiling_string



print combinestring(input_string)
#>>> ['Beaver Animal','Strawberry','Strawberry Fruit']

Answer 3

In [21]: first_names
Out[21]: ['Bob', 'Steve', 'Ted', 'Blake', 'Harry', 'Edric', 'Tommy', 'Bartholomew']

In [22]: surnames
Out[22]: ['Fischer', 'Stinson', 'McCord', 'Bone', 'Harvey']

In [23]: inp = """Bob Fischer Steve Ted Stinson Blake Harry McCord
    ...: Edric Bone Tommy Harvey Bartholomew""".split()

In [24]: out = []
    ...: fullname = None
    ...: for name in inp:
    ...:     if name in first_names:
    ...:         if fullname:
    ...:             out.append(fullname)
    ...:         fullname = name
    ...:     elif name in surnames:
    ...:         fullname += ' ' + name
    ...: out.append(fullname)
    ...:

In [25]: out
Out[25]:
['Bob Fischer',
 'Steve',
 'Ted Stinson',
 'Blake',
 'Harry McCord',
 'Edric Bone',
 'Tommy Harvey',
 'Bartholomew']

在列表中添加两个字符串

3 个答案: