Question

我正在尝试以这种配置搜索纯文本：

Named H Man, MBA
Personal: 
Address: 
Professional: 
0000 Something St 
Apt 000 
City, ST 12345-6789 
No Business Contact Information. 
Academic: 
2019 Bachelors, Education - AF s

我的目标是只检索文本中地址的第一部分，即“ 0000 Something St”和“ Apt 000”部分。由于明文中的某些条目配置不同，因此使情况变得复杂，因此我使用了更通用的方法：我试图查找包含单词“ Address：”或“ Professional：”的行以获取以我想要的文本部分开头的行，然后找到其后包含逗号作为结尾的任何行。完成这项工作后，我将编写代码以随后从这些行中删除不需要的所有内容。大部分文本都是按程序编写的，只不过其中的一句话什么也没输出，我认为这是因为出于某种原因它没有正确检测到“ Address：”或“ Professional：”一词。

到目前为止，我编写的代码是这样，再加上一种将其输出的方法，这不会成为问题：

def FindAddress(person):
    global address
    address = "NA"
    addressUncropped = ""
    lineBeforeAddress = 0
    lineAfterAddress = 0
    personLines = person.splitlines()
    wordList = []
    lineIndex = 0
    for line in personLines:  # This sets up the before and after markers to be used later
        wordList = line.split(" ")
        for word in wordList:
            print(word)
            if word == "Address:" or word == "Professional:" and lineBeforeAddress == 0:
                lineBeforeAddress = lineIndex
            if "," in line and lineAfterAddress == 0 and lineIndex >= lineBeforeAddress:
                lineAfterAddress = lineIndex+1
        lineIndex += 1
    for line in personLines[lineBeforeAddress:lineAfterAddress]:  # This uses the before and after markers to get the address
        addressUncropped += line

如果您有其他不相关的建议可能对完成此任务有所帮助，我也想听听。谢谢！

Answer 1

问题在于此条件在第一行为真

if "," in line and lineAfterAddress == 0 and lineIndex >= lineBeforeAddress:

第一行在Named H Man, MBA中包含一个逗号。 lineAfterAddress和lineBEforeAddress均为零，因此lineIndex >= lineBeforeAddress为true。您需要检查是否已设置lineBeforeAddress，因此还需要条件lineBeforeAddress > 0。

此外，此测试不应在for word in wordList循环中进行，因为它只是测试整行，而不是单个单词。

最后的循环可以简化为：

addressUncropped = "".join(personLines[lineBeforeAddress:lineAfterAddress])

完整代码：

def FindAddress(person):
    global address
    address = "NA"
    addressUncropped = ""
    lineBeforeAddress = 0
    lineAfterAddress = 0
    personLines = person.splitlines()
    wordList = []
    lineIndex = 0
    for line in personLines:  # This sets up the before and after markers to be used later
        wordList = line.split(" ")
        for word in wordList:
            if (word == "Address:" or word == "Professional:") and lineBeforeAddress == 0:
                lineBeforeAddress = lineIndex
        if "," in line and lineAfterAddress == 0 and lineBeforeAddress > 0 and lineIndex >= lineBeforeAddress:
            lineAfterAddress = lineIndex+1
        lineIndex += 1
    addressUncropped = "".join(personLines[lineBeforeAddress:lineAfterAddress])
    return addressUncropped

Answer 2

我没有遍历您的代码，但是如果您只是想查找以"Address:"或Professional:"开头的行的索引，则可以执行以下操作：

[i for i,l in enumerate(person.splitlines()) if l.startswith("Address:") or l.startswith("Professional:")]

为什么我的程序在这里没有检测到“地址：”或“专业人士”一词？

2 个答案: