Question

这听起来听起来很平庸，但却很痛苦。所以我写了解析行的代码。 .txt文件的一行与我的re.match相匹配，而另一行与我的re.match不匹配。

cat file.txt
00.00.00 :  Blabla
x

在这种情况下，我将检查第一个字母“ x”。

def parser():
path = "file.txt"
with open (path, 'r+') as file:
msg = {}
list = []
start = 0
   lines = file.readlines() 
   for i in range (0,len(lines)):
        line = lines[i]

        if re.match('MY RULES', line) is not None:
                field['date'] = line[:8]
                msg['msg'] = line[start + 2:]
                print msg
                if line.startswith('x'):
                    msg['msg'] += line

        list.append(msg)

    print chat

输出2行

{'date': '0.0.00', 'msg': 'BlaBla'}
{'msg': 'x'}

问题是，如果以“ x”开头，我无法将第二个dict消息['msg']附加到最后一条消息上。

预期输出为：

 {'date': '0.0.00', 'msg': 'BlaBlax'}

我尝试使用变体来更改最后添加的聊天记录：

        else:

        list[len(list) - 1]['msg'] += + line

但是我得到了错误： IndexError：列表索引超出范围

我还尝试使用next（infile）预测下一行，但随后每隔一行输出一次。

您将如何欺骗嵌套循环来添加字典条目？

欢呼

Answer 1

首先不要将list用作内置关键字的变量名，而将其隐藏起来。

第二，如果我理解正确，您想附加最后的结果。

这里：

if re.match('MY RULES', line) is not None:
                field['date'] = line[:8]
                msg['msg'] = line[start + 2:]
                print msg
                if line.startswith('x'):
                    msg['msg'] += line

您正在分析同一行，因此下一次迭代中的此msg['msg'] = line[start + 2:]将覆盖字典msg中的密钥msg，并清除先前的值。所以这段代码

 field['date'] = line[:8]
 msg['msg'] = line[start + 2:]
 print msg

即使对于输入文件中的简单x，也总是执行该操作，并清除键msg下的先前值

如果您希望它可以工作，尽管我建议以不同于本地范围变量的方式存储中间值，但仍然需要其他方式。

完整的示例代码修复：

def parser():
    path = "file.txt"
    with open(path, 'r+') as file:
        msg = {}
        chat = []
        start = 0
        lines = file.readlines()
        for i in range(0, len(lines)):
            line = lines[i]

            if True:
                if line.startswith('x'):
                    msg['msg'] += line
                else:
                    msg['date'] = line[:8]
                    msg['msg'] = line[12:]
                    chat.append(msg)


        print(chat)

parser()

结果：

[{'date': '00.00.00', 'msg': 'Blabla\nx'}]

假设第if re.match('MY RULES', line) is not None:行对于文件中的所有行均为True，

00.00.00 :  Blabla
x

Answer 2

如何？

path = "file.txt"
with open (path, 'r') as f:
    msg = dict()
    for line in f.readlines():
        if line[0].isdigit():
            tmp = line.split(':')
            date = tmp[0].strip()
            msg[date] = ' '.join(*[x.split() for x in tmp[1:]])
        else:
            msg[date] += ' ' + ' '.join(*[line.split()])

我们逐行查找，如果该行的第一个字母是一个数字，我们假定它是一个日期，然后将其添加到字典中-否则，将找到的字符串添加到我们做的最后一个字典条目中。 str.split()确保您可以使用所有不同的空白字符。

您可以确定用正则表达式替换for循环中的if语句...我通常会在实现中看到的问题是，只要输入稍有变化（例如，预期的更多空白字符），您的解决方案就会产生错误的结果。基本的python字符串操作非常强大；）

更新

这应该产生正确的输出：

*file.txt*
00.00.00 : Blabla
x
00.00.00 : Blabla2
x2


path = "file.txt"
with open (path, 'r') as f:
    lst = list()
    for line in f.readlines():
        if line[0].isdigit():
            tmp = line.split(':')
            date = tmp[0].strip()
            msg = {date: ' '.join(*[x.split() for x in tmp[1:]])}
            lst.append(msg)
        else:
            msg[date] += ' ' + ' '.join(*[line.split()])

print(lst)
>>> [{'00.00.00': 'Blabla x'}, {'00.00.00': 'Blabla2 x2'}]

我错过了要在字典中分别存储每对并将其附加到列表中的部分。

在循环中追加到字典中-奇怪的行为

2 个答案: