如何使用脚本语言在文本文件中提取多个文本

时间:2015-01-14 18:37:21

标签: python perl batch-file scripting

最有可能的问题是,但我还没有能够很好地了解代码,以实现我的目标。

我有一个包含1000个条目的文本文件,例如以下3个连续条目。我希望提取的文本文件

number.xml 及其对应的当前视频时序:1280x720p 60Hz 并将其逐个吐出文本文件。

Report complete.on: E01A040E.xml
EDID raw data:
---  00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000  00 FF FF FF FF FF FF 00 4C 2D FC 08 00 00 00 00 
010  29 15 01 03 80 10 09 78 0A EE 91 A3 54 4C 99 26 
020  0F 50 54 BD EE 00 81 C0 01 01 01 01 01 01 01 01 
030  01 01 01 01 01 01 66 21 56 AA 51 00 1E 30 46 8F 
040  33 00 A0 5A 00 00 00 1E 01 1D 00 72 51 D0 1E 20 
050  6E 28 55 00 A0 5A 00 00 00 1E 00 00 00 FD 00 18 
060  4B 0F 44 17 00 0A 20 20 20 20 20 20 00 00 00 FC 
070  00 53 41 4D 53 55 4E 47 0A 20 20 20 20 20 01 6D 
080  02 03 1F F1 47 84 05 03 10 20 22 07 23 09 07 07 
090  83 01 00 00 E2 00 0F 67 03 0C 00 10 00 B8 2D 01 
0A0  1D 80 18 71 1C 16 20 58 2C 25 00 A0 5A 00 00 00 
0B0  9E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 A0 5A 00 
0C0  00 00 18 02 3A 80 18 71 38 2D 40 58 2C 45 00 A0 
0D0  5A 00 00 00 1E 00 00 00 00 00 00 00 00 00 00 00 
0E0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0F0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FD 
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc
# 
EDID description: E01A0A8A.xml
EDID raw data:
---  00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000  00 FF FF FF FF FF FF 00 4C 2D FC 08 00 00 00 00 
010  29 15 01 03 80 10 09 78 0A EE 91 A3 54 4C 99 26 
020  0F 50 54 BD EE 00 81 C0 01 01 01 01 01 01 01 01 
030  01 01 01 01 01 01 66 21 56 AA 51 00 1E 30 46 8F 
040  33 00 A0 5A 00 00 00 1E 01 1D 00 72 51 D0 1E 20 
050  6E 28 55 00 A0 5A 00 00 00 1E 00 00 00 FD 00 18 
060  4B 0F 44 17 00 0A 20 20 20 20 20 20 00 00 00 FC 
070  00 53 41 4D 53 55 4E 47 0A 20 20 20 20 20 01 6D 
080  02 03 1F F1 47 84 05 03 10 20 22 07 23 09 07 07 
090  83 01 00 00 E2 00 0F 67 03 0C 00 10 00 B8 2D 01 
0A0  1D 80 18 71 1C 16 20 58 2C 25 00 A0 5A 00 00 00 
0B0  9E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 A0 5A 00 
0C0  00 00 18 02 3A 80 18 71 38 2D 40 58 2C 45 00 A0 
0D0  5A 00 00 00 1E 00 00 00 00 00 00 00 00 00 00 00 
0E0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0F0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FD 
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc
# 
EDID description: E01A0C88.xml
EDID raw data:
---  00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
000  00 FF FF FF FF FF FF 00 08 59 42 00 01 00 00 00 
010  01 16 01 03 80 45 27 78 0A D0 DD A9 53 49 9D 23 
020  11 47 4A A3 08 00 81 C0 81 00 81 0F 81 40 81 80 
030  95 00 B3 00 01 01 52 35 80 80 70 38 1F 40 20 20 
040  13 00 C4 8E 21 00 00 1E 46 20 00 A4 51 00 2A 30 
050  50 80 37 00 20 46 21 00 00 1A 00 00 00 FC 00 4E 
060  53 2D 33 32 4C 32 34 30 41 31 33 0A 00 00 00 FD 
070  00 37 4C 1E 50 11 00 0A 20 20 20 20 20 20 01 23 
080  02 03 20 73 48 05 04 03 02 01 06 07 90 26 09 07 
090  07 15 07 50 83 01 00 00 67 03 0C 00 10 00 B8 2D 
0A0  01 1D 00 72 51 D0 1E 20 6E 28 55 00 C4 8E 21 00 
0B0  00 1E 8C 0A D0 8A 20 E0 2D 10 10 3E 96 00 13 8E 
0C0  21 00 00 18 01 1D 80 18 71 1C 16 20 58 2C 25 00 
0D0  C4 8E 21 00 00 9E 00 00 00 00 00 00 00 00 00 00 
0E0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0F0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 53 
+5V: OK
Video information:
- Current video timing: 1280x720p 60Hz
- Incoming video matches CEA-861 VIC 4 and 69 exactly
- HDMI video detected
- Received AVI VIC 4
- Color space: YCbCr 4:4:4 8 bpc

这是我到目前为止的代码,但对我来说没有用。这是在python中完成的,但如果可以使用任何其他脚本语言。只是我不习惯编写脚本。万分感谢那些伸出援助之手。

#!/usr/bin/env python

inFile = open("batch01.txt")
outFile = open("result.txt", "w")

with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "EDID description:":
            copy = True
        elif line.strip() == "- Current video timing":
            copy = True
        elif copy:
            outfile.write(line)

inFile.close()
outFile.close()

4 个答案:

答案 0 :(得分:2)

我会使用regular expressions来执行此操作:

#!/usr/bin/env python

import re

with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
    for line in infile:
        m = re.search('(EDID description|- Current video timing): (.*)', line)
        if m is not None:
            outfile.write(m.group(2) + '\n') 

这将打印出来

1280x720p 60Hz
E01A0A8A.xml
1280x720p 60Hz
E01A0C88.xml
1280x720p 60Hz

答案 1 :(得分:1)

看起来你实际上需要检查给定的行是否以某些子串开头,而不是完全精确的比较(这是==运算符给你的)。相反,您的for循环应使用startswith方法查看该行的开头,并仔细观察:

for line in infile:
    if line.strip().startswith("EDID description:"):
        copy = True
    elif line.strip().startswith("Report complete.on:"):  # Based on your data, it seems like you need to check for these as well - maybe not?
        copy = True
    elif line.strip().startswith("- Current video timing"):
        copy = True
    else:
        copy = False
    if copy:
        outfile.write(line)

但循环可以显着简化:

prefixes = [
 "EDID description:", "Report complete.on:", "- Current video timing"
]
for line in infile:
    for prefix in prefixes:
        if line.strip().startswith(prefix):
            outfile.write(line)
            break

这消除了多分支if / elif结构以及布尔copy标志。

使用您的示例输入数据,我在结果文件中得到了这个:

Report complete.on: E01A040E.xml
- Current video timing: 1280x720p 60Hz
EDID description: E01A0A8A.xml
- Current video timing: 1280x720p 60Hz
EDID description: E01A0C88.xml
- Current video timing: 1280x720p 60Hz

同样,我不确定你是否想要“报告完整。在线”,但它看起来像你一样。

答案 2 :(得分:1)

如果没有实际运行,我会看到您需要解决的三个问题。

  1. if / elif控制结构中的相等条件永远不会成立。
  2. 假设我在第1号中所描述的是真实的;实际上不会达到最终的elif,因为一旦前面的任何一个表达式评估为真,其余的if / elif短路并没有检查其他条件。
  3. 你需要在for循环的下一次迭代之前“重置”copy = False,否则在第一次设置copy = True之后复制将保持为真。
  4. 建议修复:

    1. 使用'line.strip()。find('EDID description:')'之类的内容来确定行是否包含您要查找的字符串。同样,您需要确定子字符串是 行,不等于该行。
    2. 您需要在if / elif结构之外移动复制操作。也就是说,不是使它成为同一个if / elif结构的一部分,而是在当前结构之后创建一个单独的'if copy:'结构,以便在找到该行时输出该行。
    3. 输出行后,设置copy = False,以便正确初始化for循环的下一次迭代。否则,您将在第一场比赛打印后获得每一行。
    4. 这样的事情:(我实际上没有测试过这个......)

      #!/usr/bin/env python
      
      inFile = open("batch01.txt")
      outFile = open("result.txt", "w")
      
      with open('batch01.txt') as infile, open('result.txt', 'w') as outfile:
          copy = False
          for line in infile:
              # use the find to see if the line CONTAINS the string you are looking for
              if line.strip().find("EDID description:"):
                  copy = True
              elif line.strip().find("- Current video timing"):
                  copy = True
      
              # make this a separate if
              if copy:
                  outfile.write(line)
      
              # reset this to False to it can be evaluated and set properly in the next iteration
              copy = False
      
      inFile.close()
      outFile.close()
      

      希望这有帮助。

答案 3 :(得分:1)

下面是一个批处理文件.bat解决方案,我认为更简单......

编辑根据评论中的要求修改的程序

@echo off
(for /F "tokens=3-6" %%a in ('findstr /L ".xml Current" batch01.txt') do (
   if "%%b" equ "" (
      set /P "=%%a - " < NUL
   ) else (
      if "%%c" equ "" (
         echo No VSYNC detected
      ) else (
         echo Current video timing: %%c %%d
      )
   )
)) > result.txt

输出:

E01A040E.xml - Current video timing: 1280x720p 60Hz
E01A0A8A.xml - Current video timing: 1280x720p 60Hz
E01A0C88.xml - Current video timing: 1280x720p 60Hz