Question

我是python的新手，想写一个脚本从一堆文件中提取一些数字。这是我尝试做的一个有代表性的例子：

File_name_1: Bob-01
File content: 
...(Lots of text)
Tea cups: 3
Tea cups: 4
Tea cups: 6
...(Lots of text)
Completed the first task, proceed to the next task.
...(Lots of text)
Tea cups: 7
Termination

我们还说我们还有另一个文件：

File_name_2: Bob-02
File content: 
...(Lots of text)
Tea cups: 2
Tea cups: 7
Tea cups: 3
Tea cups: 8
...(Lots of text)
Completed the first task, proceed to the next task.
...(Lots of text)
Tea cups: 1
Termination.

目前我已经编写了代码来提取文件名（例如Bob-01），每个Bob之后的数字（例如01）和文件内容（例如，第一个文件）并存储在名为list_of_file

的变量中

print list_of_file

[["Bob-01"], 
  01,
 [".......", "Tea Cups: 3", "Tea Cups: 4", "Tea cups: 6", "....", "Completed the first task, proceed to the next task.", "....", "Tea cups: 7", "Termination"],
 ["Bob-02"], 
  02,
 [".......", "Tea Cups: 2", "Tea Cups: 7", "Tea cups: 3", "Tea cups: 8", "....", "Completed the first task, proceed to the next task.", "....", "Tea cups: 1", "Termination]]

我想要做的是提取行后的茶杯数量＆＃34;完成第一项任务，然后继续下一个任务。＆＃34;在每个文件中。所以我写了以下代码：

def get_tea_cups (list_of_files):
   list_of_cup = []
   for line in file[2]:
      if "Completed the first task" in line:
         for line in file[2]:
            if "Tea cups:" in line:
              tea_cups_line = line.split()
              cup_num = tea_cups_line [2]
              list_of_cup.append(file[0], file[1], cup_num)
   return list_of_cup

我的过程：如果我能找到＆＃34;完成第一项任务＆＃34;在list_of_file中，希望我能够在包含＆＃34;完成第一个任务＆＃34;的字符串之后提取茶杯的数量（例如，7个用于Bob-01，1个用于Bob-02）。已经出现了。然而，当我执行我的代码时，我似乎已经提取了所有数量的茶杯，这不是我想要的。

我认为发生这种情况的原因是因为if语句总是如此，所以我最终提取了所有数量的茶杯。

有没有办法解决这个问题？我知道如果我只对一个文件进行提取，我可以存储作为列表找到的所有茶杯编号并取最后一个值（通过向后切片）。当我为多个文件执行提取时，我可以执行类似的操作吗？

我试着环顾四周，但还没找到任何有用的东西。如果您遇到与此问题相关的任何内容，请发布以下链接。

谢谢！

Answer 1

更新代码：我会做什么：

.....

for i, line in enumerate(file[2]):
    if "Completed the first task" in line:
         for j in xrange(i+1, len(file[2]):
            if "Tea cups:" in file[2][j]:
              tea_cups_line = file[2][j].split()
              cup_num = tea_cups_line [2]
              list_of_cup.append(file[0], file[1], cup_num)
              break
return list_of_cup

它就像你的想法，但我的代码计算文件[2]中的变体。当“完成第一个任务”开始时，文本中的下一个任务将再次循环，直到找到“茶杯”。取数字并休息。

对我的英语道歉并希望得到这个帮助

Answer 2

是的，有办法。我建议你向后阅读文件，找到第一次出现的茶，然后打破并解析下一个文件。 我的解决方案假设您的文件适合内存。很可能这需要一段时间来阅读大文件

您可以通过以下方式读取文件：

for line in reversed(list(open("filename"))):
    print(line.rstrip())

现在，您只能获得所需的茶杯：

cups = []
for line in reversed(list(open("filename"))):
    if "Tea cups" in line.rstrip():
        cups.append(line.rstrip().split()[2])
        break
print(cups)

在特定字符串出现后，Python提取数字

2 个答案: