Question

我有一个大文本文件，想要从这些文件中提取一些值。所需值位于两个位置（所有文件中某些指定文本之前和之后）。我想要在指定文本之后的值。我写了下面的剧本。

#!/usr/bin/env python
import sys, re, os, glob

path = "./"
files = os.listdir(path)
for finding in glob.glob('*.txt'):
    file = os.path.join(path, finding)
    text = open(file, "r")
    CH = []   

    for line in text:
        if re.match("(.*)(XX)(.*)", line):
            CH.append(line)
    print CH

但是（正如预期的那样）脚本正在打印所有XX值。如何编辑此脚本以获得所需的输出。以下是大文本文件的一部分。

  ..................
  ..................  
  XX    1   -0.01910     
  XX    2    1.34832     
  XX    3   -2.36329     
  XX    4   -5.94807     
  XX    5    6.34862
  XX    6    core     
  Texts which I want to specify like (Normal)..........
  XX    1   -0.61910     
  XX    2    2.34832     
  XX    3   -0.06329     
  XX    4   -0.34807     
  XX    5    0.36862
  XX    6    [coreed   
  ..................
  ..................

所需的输出如下，在文本＆＃39;正常＆＃39;之后按XX值的降序排列。

  XX     2.34832   
  XX     0.36862     
  XX    -0.06329     
  XX    -0.34807     
  XX    -0.61910

提前多多感谢。

Answer 1

首先，我对你写的正则表达式感到困惑'（。）（XX）（。）'。我是否正确你想要所有第三个字段来自以（空白然后）XX开头的行。或者更确切地说是“我要指定的文本”之后的那些行？

我能想到的最简单的方法是携带一个布尔值来表明你是否找到了这个特殊的文本行“我要指定的文本（正常）.......... “但是。例如......

#!/usr/bin/env python
import sys, re, os, glob

path = "./"
files = os.listdir(path)
for finding in glob.glob('*.txt'):
    file = os.path.join(path, finding)
    text = open(file, "r")
    CH = []   
    doPayAttention = False

    for line in text:
        if re.match("Texts which I want to specify", line):
            doPayAttention = True
            continue
        if not doPayAttention:
            continue
        mm = re.match(r"^\s*XX\s+\S+\s+(\S+)\s*$", line)
        if mm is not None:
            CH.append(mm.group(1))
    CH = sorted(CH, reversed=True)
    for _ch in CH:
        print 'XX ', _ch

此外，根据您对文件的信任程度，使用string.split（）应该为您提供更具可读性的代码，而无需使用正则表达式。最后，应该指出的是，这是一个特别简单的AWK程序。

awk '/Texts which I want to specify/,EOF {print $1 " " $3}' | sort -n

在python中的指定文本之后找到一些值

1 个答案: