如何使用python读取文件?

时间:2016-02-17 19:02:04

标签: python regex python-2.7

我正在阅读上面的.txt文件,其中我必须识别那些三个字母的单词(ARW,CZC,DUN等)。后来我必须阅读测试用例ID,如VR-GREQ ....直到下一个语言。但是我在阅读这个.txt文件时遇到了问题。 以下是我的代码:

with open(output_filename) as parser_file:
        for language in parser_file:
            language = language.strip()
            if(re.search('[A-Z]', language)):
                lines = parser_file.readlines()

我必须进一步编码,有人可以帮助我吗?

我的新代码:     output_filename = r" C:\ Usersktop \ TEST \ Language \ Output.txt"     def写():

    rx = r'^([A-Z]{3})$'

# define a dictionary for the languages
    languages = {}
    # looks for exactly three uppercase letters from beginning to end

    # define a temporary list
    tmp = list()
    for line in open(output_filename, 'r+'):
        m = re.search(rx, line, re.MULTILINE)
        if m is not None:
            if len(tmp) > 0:
                languages[current] = tmp
            tmp = list()
            current = m.group(1) # current holds the actual language tag
        else:
            if len(line) > 0:
                tmp.append(line.strip())

    # after the loop
    if len(tmp) > 0:
        languages[current] = tmp
    print languages

2 个答案:

答案 0 :(得分:1)

如果您需要查找长度为3的字符串,请使用[A-Z] {3}。您可以根据这3个字符“单词”的位置将整个列表拆分为数组。

编辑:回答你的评论......

headers=re.split('[A-Z]{3}\n',YOUR_STRING)会找到每个部分的“标题”。 然后你可以NEW_STRING=YOUR_STRING[YOUR_STRING.find(headers[0]):YOUR_STRING.find(headers[1])]

使用循环和其他工具,这可以帮助您实现您的目标。

答案 1 :(得分:0)

在您的评论的帮助下,问题变得更加清晰。请考虑以下代码:

import re

string = your_string_as_above
rx = r'^([A-Z]{3})$'

# define a dictionary for the languages
languages = {}

# define a temporary list
tmp = list()
for line in re.split(r'\n', string):
    m = re.search(rx, line, re.MULTILINE)
    if m is not None:
        if len(tmp) > 0:
            languages[current] = tmp
        tmp = list()
        current = m.group(1) # current holds the actual language tag
    else:
        if len(line) > 0:
            tmp.append(line.strip())

# after the loop
if len(tmp) > 0:
    languages[current] = tmp
print languages

""" prints out a dictionary with the language as key
{'FRC': ['VR-GREQ-299659_18', 'VR-GREQ-299659_19', 'VR-GREQ-299659_28', 'VR-GREQ-299659_31', 'VR-GREQ-299659_32'], 'CZC': ['VR-GREQ-299684_6k', 'VR-GREQ-299606_6', 'VR-GREQ-299606_8', 'VR-GREQ-299640_1', 'VR-GREQ-299640_5', 'VR-GREQ-299640_6', 'VR-GREQ-299640_7'], 'DUN': ['FB_71125_1'], 'ARW': ['VR-GREQ-299684_6j', 'VR-GREQ-299684_6k', 'VR-GREQ-299606_3', 'VR-GREQ-299606_4', 'VR-GREQ-299606_5', 'VR-GREQ-299606_7', 'VR-GREQ-299606_9', 'VR-GREQ-299607_4', 'VR-GREQ-299608_1', 'VR-GREQ-299563_10']}
""""

请参阅a demo on ideone.com