我是python /编程的新手。我试图在字符串中的字母字符后面输入一个六位数字,如下所示:
A12345612341234 asdfa我们&a; a aslkfj4353 alsdfasA345678asA858585943
所以在上面我想拉A123456并循环拉A345678和A858585。我怎样才能做到这一点?我使用PyPDF2从pdf中提取文本并将其设置为变量,但我已经尝试过拼接和列表,但我无法弄清楚如何使其工作。我花了一些时间在网上搜索并找到了大量的例子,但它们与我的情况无关,大多数都有空白。好像它应该是真的很简单。这就是我正在做的事情
#import PyPDF2 and set extracted text as the page_content variable
import PyPDF2
pdf_file = open('5302.pdf','rb')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
page = read_pdf.getPage(0)
page_content = page.extractText()
#initialize the user_input variable
user_input = ""
#function to get the AFE numbers from the pdf document
def get_afenumbers(Y):
#initialize the afe and afelist variables
afe = "A"
afelist = ""
x = ""
#Make a while loop of this after figuring out how to get only 6 digits
#after the "A" use .isdigit() somehow?
while True:
if user_input.upper().startswith("Y") == True:
#Find the letter A and extract it and its following 6 digits
if "A" in page_content:
#right now only getting everything after first A
afe = page_content[page_content.find("A")+1:]
#Add AFEs to afelist
afelist += afe
#Build a string of AFEs seperated by a new line character
x = x + '\n' + afe
print(afe)
break
else:
afe = "No AFE numbers found..."
if user_input.upper().startswith("N") == True:
print("HAVE A GREAT DAY - GOODBYE!!!")
break
#Build a while loop for initial question prompt (when Y or N is not True):
while user_input != "Y" and user_input != "N":
user_input = input('List AFE numbers? Y or N: ').upper()
if user_input not in ["Y","N"]:
print('"',user_input,'"','is an invalid input')
get_afenumbers(user_input)
答案 0 :(得分:0)
您可以使用正则表达式来提取匹配项。
忽略您的循环,我们可以使用以下方式设置要搜索的文本:
text = '''A12345612341234 asdfa we'a aslkfj4353 alsdfasA345678asA858585943'''
现在我们希望匹配任何大写字母([A-Z]
),后跟任意数字中的6个([0-9]{6}
)。在您的代码中,您似乎只需要A
,因此您可以仅使用A替换[A-Z]:
import re
re.findall('[A-Z][0-9]{6}', text)
给出了答案:
['A123456', 'A345678', 'A858585']