Question

我正在尝试从聊天记录文件中删除我的对话，并仅分析其他人的数据。当我像这样将文件加载到Python中时：

with open(chatFile) as f:
    chatLog = f.read().splitlines()

数据加载如此（比例子长得多）：

'My Name',
'08:39 Chat data....!',
'Other person's name',
'08:39 Chat Data....',
'08:40 Chat data..., 
'08:40 Chat data...?',

我希望它看起来像这样：

'Other person's name',
'08:39 Chat Data....',
'08:40 Chat data..., 
'08:40 Chat data...?',

我在考虑使用带有正则表达式的if语句：

name = 'My Name'
for x in chatLog:
    if x == name:
        "delete all data below until you get to reach the other 
         person's name"

我无法让这段代码正常运行，有什么想法吗？

Answer 1

我认为你误解了“正则表达式”的意思......这并不意味着你只能写英语语言指令而python解释器会理解它们。无论是那个还是你正在使用伪代码，这使得无法进行调试。

如果您没有其他人的姓名，我们可能会认为它不是以数字开头的。假设所有非名称行都以数字开头，如示例所示：

name = 'My Name'
skipLines = False
results = []
for x in chatLog:
    if x == name:
        skipLines = True
    elif not x[0].isdigit():
        skipLines = False

    if not skipLines:
        results.append(x)

Answer 2

others = []
on = True
for line in chatLog:
    if not line[0].isdigit():
        on = line != name
    if on:
        others.append(line)

Answer 3

您可以使用re.sub删除所有邮件，并将空字符串作为替换字符串的第二个参数。

假设每条聊天消息都以一个以时间戳开头的新行开始，并且没有人的名字可以以数字开头，正则表达式模式re.escape(yourname) + r',\n(?:\d.*?\n)*'应该与您的所有消息匹配，然后这些匹配可以是用空字符串替换。

import re

with open(chatfile) as f:
    chatlog = f.read()
    yourname = 'My Name'
    pattern = re.escape(yourname) + r',\n(?:\d.*?\n)*'
    others_messages = re.sub(pattern, '', chatlog)
    print(others_messages)

这将用于从任意数量的用户聊天的聊天记录中删除任何用户的消息。

Python - 删除聊天记录文件的条件行

3 个答案: