如何从python3中的文本文件中提取特定部分

时间:2017-05-04 06:48:17

标签: python-3.x

这是我的python文件

path = '/my/file/list.txt'
with open(path,'rt') as file:
    print("step 1")
    collected_lines = []
    started = False
    for line in file:   
        for n in range(1, 10):
            if line.startswith('PLAY NO.{}'.format(n)):
                started = True
                print("started at line {}".format(line[0]))
                continue
            if started:
                collected_lines.append(line)        
            if started and line == 'PLAY NO.{}'.format(n+1):
                print("end at line {}".format(line[0]))
                break           
            print(collected_lines.append(line))

这是我的代码.. 输出:

None
None
None
None
None
None

现在我希望从第二场比赛开始的比赛能打第三场......但是我得到了无..请提出任何建议......我使用的是Python 3.5

对不起,这是第一次在这个网站上提问。 我的文件看起来像这样..

TextFile.txt的

Hello and Welcome This is the list of plays being performed here
              PLAY NO. 1
 1. adknjkd
 2. skdi
 3. ljdij

              PLAY NO. 2
 1. hsnfhkjdnckj
 2. sjndkjhnd  and so on

2 个答案:

答案 0 :(得分:0)

path = 'list.txt'
collected_lines = []
with open(path,'rt') as file:
    print("step 1")
    started = False
    lineNo  = 0
    for line in file:   
        lineNo += 1
        for n in range(1, 10):
            # print('PLAY NO. {}'.format(n))
            if started and line.lstrip().startswith('PLAY NO. {}'.format(n)):
                print("### end     at line {}".format(lineNo))
                started = False
                break           
            if line.lstrip().startswith('PLAY NO. {}'.format(n)):
                started = True
                print("### started at line {}".format(lineNo))
                break
        if started:
            collected_lines.append(line)

print("collected_lines: \n\n", *[ item for item in collected_lines ])

给出:

step 1
### started at line 2
### end     at line 7
collected_lines: 

               PLAY NO. 1
  1. adknjkd
  2. skdi
  3. ljdij

关于已修复问题的说明:

  1. 使用.lstrip()以使.startswith()按预期工作
  2. NO.中的{}startswith('PLAY NO. {}'.format(n)之间添加了一个空格,以便if条件可以找到该行
  3. 重新安排if s的顺序,以避免在起跑线上发现终点线
  4. started = False添加到循环中以停止收集行。
  5. 前导空格的问题已经足以阻止代码找到该行。单独修复此问题无法解决问题,因为format字符串中缺少空格,因此必须修复这两个问题才能使代码按预期工作。等等......请参阅上面的注释。

答案 1 :(得分:0)

如果你想要一个带有as标签的dict作为游戏编号,作为一个带有游戏线条的列表,你可以使用defaultdict

定义文本

text = """Hello and Welcome This is the list of plays being performed here
              PLAY NO. 1
 1. adknjkd
 2. skdi
 3. ljdij

              PLAY NO. 2
 1. hsnfhkjdnckj
 2. sjndkjhnd  and so on"""

定义正则表达式

regex = re.compile('^\s*PLAY NO. (\d+)$')

解析线

label = None  # no play to start with
recorded_lines = defaultdict(list)

for line_no, line in enumerate(StringIO(text)):
# In the real code replace the 'StringIO(text)' with 'file'
    try:
        play_no = int(regex.findall(line)[0])
        # If this regex does not match, it will throw an IndexError 
        # The code underneath is only executed when a new play starts
        if label:  # if there is no play underway, there can be no ending
            print('PLAY NO. %i ended at line number %i' % (label, line_no-1))
        label = play_no
        print('PLAY NO. %i started at line number %i' % (play_no, line_no))
    except IndexError:
        # no new play started
        if label and line.strip():
            recorded_lines[play_no].append(line.strip())
    print(line_no, line)
print(recorded_lines)

产量

defaultdict(list,
            {1: [(2, '1. adknjkd'), (3, '2. skdi'), (4, '3. ljdij')],
             2: [(7, '1. hsnfhkjdnckj'), (8, '2. sjndkjhnd  and so on')]})

stout上的输出:

0 Hello and Welcome This is the list of plays being performed here

PLAY NO. 1 started at line number 1
1               PLAY NO. 1

2  1. adknjkd

3  2. skdi

4  3. ljdij

5 

PLAY NO. 1 ended at line number 5
PLAY NO. 2 started at line number 6
6               PLAY NO. 2

7  1. hsnfhkjdnckj

8  2. sjndkjhnd  and so on