Question

我正在尝试创建一个计算用户发布的推文数量的程序，从文本文件中读取。唯一的问题是我需要排除任何带有“DM”或“RT”字样的行。

file = open('stream.txt', 'r')
fileread = file.readlines()
tweets = [string.split() for string in fileread]

如何更改代码以确保它排除“DM”或“RT”的行？

感谢所有帮助：D

Answer 1

打开后请务必关闭文件。最好的方法是使用with open(...)

你的答案的解决方案是在列表理解中加入一个条件：

with open('stream.txt', 'r') as file:
    fileread = file.readlines()

tweets = [string.split() for string in fileread 
          if not "DM" in string and not "RT" in string]

如果您想要排除多个字符串，可以使用any来节省空间：

with open('stream.txt', 'r') as file:
    fileread = file.readlines()

exclude = ["DM", "RT"]
tweets = [string.split() for string in fileread 
          if not any(exclude[j] in string for j in range(len(exclude)))]

Answer 2

在声明'DM'时过滤掉包含'RT'和fileread的行：

fileread = [l for l in file.readlines() if not 'DM' in l and not 'RT' in l]

Answer 3

您可以简单地遍历文件中的每一行：

tweets = list()
with open('stream.txt', 'r') as f:
    for line in f:
        if "DM" not in line and "RT" not in line:
            tweets.append(line.split())

Answer 4

这是一个简洁的解决方案（因为你似乎通过理解欣赏列表; - ）

file = open('stream.txt', 'r')
fileread = file.readlines()
goodlines = [lines for lines in fileread if lines[:2]!="DM" and lines[:2]!="RT"]
tweets = [string.split() for string in goodlines]

goodlines充当过滤器，如果前两个字符与＆＃39; DM＆＃39;不同，则保留fileread行。和＆＃39; RT＆＃39;。（如果我理解你的问题）

如何从文本文件中读取行，但使用python排除包含特定单词的行

4 个答案: