我正在尝试以这种配置搜索纯文本:
Named H Man, MBA
Personal:
Address:
Professional:
0000 Something St
Apt 000
City, ST 12345-6789
No Business Contact Information.
Academic:
2019 Bachelors, Education - AF s
我的目标是只检索文本中地址的第一部分,即“ 0000 Something St”和“ Apt 000”部分。由于明文中的某些条目配置不同,因此使情况变得复杂,因此我使用了更通用的方法:我试图查找包含单词“ Address:”或“ Professional:”的行以获取以我想要的文本部分开头的行,然后找到其后包含逗号作为结尾的任何行。完成这项工作后,我将编写代码以随后从这些行中删除不需要的所有内容。大部分文本都是按程序编写的,只不过其中的一句话什么也没输出,我认为这是因为出于某种原因它没有正确检测到“ Address:”或“ Professional:”一词。
到目前为止,我编写的代码是这样,再加上一种将其输出的方法,这不会成为问题:
def FindAddress(person):
global address
address = "NA"
addressUncropped = ""
lineBeforeAddress = 0
lineAfterAddress = 0
personLines = person.splitlines()
wordList = []
lineIndex = 0
for line in personLines: # This sets up the before and after markers to be used later
wordList = line.split(" ")
for word in wordList:
print(word)
if word == "Address:" or word == "Professional:" and lineBeforeAddress == 0:
lineBeforeAddress = lineIndex
if "," in line and lineAfterAddress == 0 and lineIndex >= lineBeforeAddress:
lineAfterAddress = lineIndex+1
lineIndex += 1
for line in personLines[lineBeforeAddress:lineAfterAddress]: # This uses the before and after markers to get the address
addressUncropped += line
如果您有其他不相关的建议可能对完成此任务有所帮助,我也想听听。谢谢!
答案 0 :(得分:2)
问题在于此条件在第一行为真
if "," in line and lineAfterAddress == 0 and lineIndex >= lineBeforeAddress:
第一行在Named H Man, MBA
中包含一个逗号。 lineAfterAddress
和lineBEforeAddress
均为零,因此lineIndex >= lineBeforeAddress
为true。您需要检查是否已设置lineBeforeAddress
,因此还需要条件lineBeforeAddress > 0
。
此外,此测试不应在for word in wordList
循环中进行,因为它只是测试整行,而不是单个单词。
最后的循环可以简化为:
addressUncropped = "".join(personLines[lineBeforeAddress:lineAfterAddress])
完整代码:
def FindAddress(person):
global address
address = "NA"
addressUncropped = ""
lineBeforeAddress = 0
lineAfterAddress = 0
personLines = person.splitlines()
wordList = []
lineIndex = 0
for line in personLines: # This sets up the before and after markers to be used later
wordList = line.split(" ")
for word in wordList:
if (word == "Address:" or word == "Professional:") and lineBeforeAddress == 0:
lineBeforeAddress = lineIndex
if "," in line and lineAfterAddress == 0 and lineBeforeAddress > 0 and lineIndex >= lineBeforeAddress:
lineAfterAddress = lineIndex+1
lineIndex += 1
addressUncropped = "".join(personLines[lineBeforeAddress:lineAfterAddress])
return addressUncropped
答案 1 :(得分:0)
我没有遍历您的代码,但是如果您只是想查找以"Address:"
或Professional:"
开头的行的索引,则可以执行以下操作:
[i for i,l in enumerate(person.splitlines()) if l.startswith("Address:") or l.startswith("Professional:")]