为什么我的程序在这里没有检测到“地址:”或“专业人士”一词?

时间:2019-05-03 21:11:34

标签: python python-3.x text

我正在尝试以这种配置搜索纯文本:

Named H Man, MBA
Personal: 
Address: 
Professional: 
0000 Something St 
Apt 000 
City, ST 12345-6789 
No Business Contact Information. 
Academic: 
2019 Bachelors, Education - AF s

我的目标是只检索文本中地址的第一部分,即“ 0000 Something St”和“ Apt 000”部分。由于明文中的某些条目配置不同,因此使情况变得复杂,因此我使用了更通用的方法:我试图查找包含单词“ Address:”或“ Professional:”的行以获取以我想要的文本部分开头的行,然后找到其后包含逗号作为结尾的任何行。完成这项工作后,我将编写代码以随后从这些行中删除不需要的所有内容。大部分文本都是按程序编写的,只不过其中的一句话什么也没输出,我认为这是因为出于某种原因它没有正确检测到“ Address:”或“ Professional:”一词。

到目前为止,我编写的代码是这样,再加上一种将其输出的方法,这不会成为问题:

def FindAddress(person):
    global address
    address = "NA"
    addressUncropped = ""
    lineBeforeAddress = 0
    lineAfterAddress = 0
    personLines = person.splitlines()
    wordList = []
    lineIndex = 0
    for line in personLines:  # This sets up the before and after markers to be used later
        wordList = line.split(" ")
        for word in wordList:
            print(word)
            if word == "Address:" or word == "Professional:" and lineBeforeAddress == 0:
                lineBeforeAddress = lineIndex
            if "," in line and lineAfterAddress == 0 and lineIndex >= lineBeforeAddress:
                lineAfterAddress = lineIndex+1
        lineIndex += 1
    for line in personLines[lineBeforeAddress:lineAfterAddress]:  # This uses the before and after markers to get the address
        addressUncropped += line

如果您有其他不相关的建议可能对完成此任务有所帮助,我也想听听。谢谢!

2 个答案:

答案 0 :(得分:2)

问题在于此条件在第一行为真

if "," in line and lineAfterAddress == 0 and lineIndex >= lineBeforeAddress:

第一行在Named H Man, MBA中包含一个逗号。 lineAfterAddresslineBEforeAddress均为零,因此lineIndex >= lineBeforeAddress为true。您需要检查是否已设置lineBeforeAddress,因此还需要条件lineBeforeAddress > 0

此外,此测试不应在for word in wordList循环中进行,因为它只是测试整行,而不是单个单词。

最后的循环可以简化为:

addressUncropped = "".join(personLines[lineBeforeAddress:lineAfterAddress])

完整代码:

def FindAddress(person):
    global address
    address = "NA"
    addressUncropped = ""
    lineBeforeAddress = 0
    lineAfterAddress = 0
    personLines = person.splitlines()
    wordList = []
    lineIndex = 0
    for line in personLines:  # This sets up the before and after markers to be used later
        wordList = line.split(" ")
        for word in wordList:
            if (word == "Address:" or word == "Professional:") and lineBeforeAddress == 0:
                lineBeforeAddress = lineIndex
        if "," in line and lineAfterAddress == 0 and lineBeforeAddress > 0 and lineIndex >= lineBeforeAddress:
            lineAfterAddress = lineIndex+1
        lineIndex += 1
    addressUncropped = "".join(personLines[lineBeforeAddress:lineAfterAddress])
    return addressUncropped

答案 1 :(得分:0)

我没有遍历您的代码,但是如果您只是想查找以"Address:"Professional:"开头的行的索引,则可以执行以下操作:

[i for i,l in enumerate(person.splitlines()) if l.startswith("Address:") or l.startswith("Professional:")]