更强大的解决方案是使用shlex：

Question

在Python中，我刚刚读了一个文本文件的行，我想知道如何编写代码来忽略带有＃行开头的＃的注释。

我认为它应该是这样的：

for 
   if line !contain #
      then ...process line
   else end for loop

但我是Python的新手，我不知道语法

Answer 1

您可以使用startswith()

例如

for line in open("file"):
    li=line.strip()
    if not li.startswith("#"):
        print line.rstrip()

Answer 2

我建议您在看到#字符时不要忽略整行;只是忽略其余部分。您可以使用名为partition的字符串方法函数轻松完成此操作：

with open("filename") as f:
    for line in f:
        line = line.partition('#')[0]
        line = line.rstrip()
        # ... do something with line ...

partition返回一个元组：分区字符串之前的所有内容，分区字符串以及分区字符串之后的所有内容。因此，通过使用[0]建立索引，我们只获取分区字符串之前的部分。

编辑：如果您使用的是没有partition()的Python版本，则可以使用以下代码：

with open("filename") as f:
    for line in f:
        line = line.split('#', 1)[0]
        line = line.rstrip()
        # ... do something with line ...

这会将字符串拆分为“＃”字符，然后在拆分之前保留所有内容。 1参数使.split()方法在一次拆分后停止;因为我们只是抓取第0个子字符串（通过使用[0]索引），所以在没有1参数的情况下你会得到相同的答案，但这可能会快一点。（由于@gnr的评论，从我的原始代码中简化。我的原始代码因为没有充分的理由而变得更加混乱;谢谢，@ ngr。）

您也可以编写自己的partition()版本。这是一个名为part()：

的名字

def part(s, s_part):
    i0 = s.find(s_part)
    i1 = i0 + len(s_part)
    return (s[:i0], s[i0:i1], s[i1:])

@dalle指出'＃'可以出现在字符串中。正确处理这种情况并不容易，所以我忽略了它，但我应该说些什么。

如果您的输入文件对引用的字符串有足够简单的规则，那么这并不难。如果你接受任何合法的Python引用字符串会很难，因为有单引号，双引号，多行引号，反斜杠转义行尾，三引号字符串（使用单引号或双引号），以及甚至原始的弦！正确处理所有复杂状态机的唯一可行方法。

但是如果我们仅限于一个简单的引用字符串，我们可以使用一个简单的状态机来处理它。我们甚至可以在字符串中允许使用反斜杠引用的双引号。

c_backslash = '\\'
c_dquote = '"'
c_comment = '#'


def chop_comment(line):
    # a little state machine with two state varaibles:
    in_quote = False  # whether we are in a quoted string right now
    backslash_escape = False  # true if we just saw a backslash

    for i, ch in enumerate(line):
        if not in_quote and ch == c_comment:
            # not in a quote, saw a '#', it's a comment.  Chop it and return!
            return line[:i]
        elif backslash_escape:
            # we must have just seen a backslash; reset that flag and continue
            backslash_escape = False
        elif in_quote and ch == c_backslash:
            # we are in a quote and we see a backslash; escape next char
            backslash_escape = True
        elif ch == c_dquote:
            in_quote = not in_quote

    return line

我真的不想在标记为“初学者”的问题中弄得这么复杂，但这个状态机相当简单，我希望它会很有趣。

Answer 3

我迟到了，但处理shell样式（或python样式）#注释的问题非常常见。

我几乎每次阅读文本文件时都会使用一些代码 问题是它没有正确处理引用或转义的评论。但它适用于简单的情况并且很容易。

for line in whatever:
    line = line.split('#',1)[0].strip()
    if not line:
        continue
    # process line

更强大的解决方案是使用shlex：

import shlex
for line in instream:
    lex = shlex.shlex(line)
    lex.whitespace = '' # if you want to strip newlines, use '\n'
    line = ''.join(list(lex))
    if not line:
        continue
    # process decommented line

这种shlex方法不仅可以正确处理引号和转义，还增加了许多很酷的功能（比如能够让文件根据需要提供其他文件）。我还没有对大文件的速度进行测试，但是它的功能非常小。

当您将每个输入行拆分为字段（在空白处）时，常见的情况甚至更简单：

import shlex
for line in instream:
    fields = shlex.split(line, comments=True)
    if not fields:
        continue
    # process list of fields

Answer 4

这是最短的形式：

for line in open(filename):
  if line.startswith('#'):
    continue
  # PROCESS LINE HERE

如果您调用它的字符串以您传入的字符串开头，则字符串上的startswith()方法返回True。

虽然在某些情况下这是好的，比如shell脚本，但它有两个问题。首先，它没有指定如何打开文件。打开文件的默认模式是'r'，这意味着“以二进制模式读取文件”。由于您需要文本文件，因此最好使用'rt'打开它。虽然这种区别与类UNIX操作系统无关，但它在Windows（以及OS X之前的Mac）上非常重要。

第二个问题是打开文件句柄。 open()函数返回一个文件对象，当你完成它们时，关闭文件被认为是一种很好的做法。为此，请在对象上调用close()方法。现在，Python将可能为你做这件事，最终; Python对象中的被引用计数，当一个对象的引用计数变为零时，它被释放，并且在某些在释放对象之后，Python将调用其析构函数（称为__del__的特殊方法）。请注意，我说可能是： Python有一个坏习惯，即在程序完成前不久，引用计数降至零的对象上实际上不会调用析构函数。我想这很匆忙！

对于像shell脚本这样的短期程序，特别是对于文件对象，这没关系。当程序完成时，您的操作系统将自动清理打开的所有文件句柄。但是如果你打开文件，读取内容，然后开始一个长计算，而不首先明确关闭文件句柄，Python可能会在你的计算过程中打开文件句柄。这是不好的做法。

此版本适用于任何2.x版本的Python，并修复了我上面讨论的两个问题：

f = open(file, 'rt')
for line in f:
  if line.startswith('#'):
    continue
  # PROCESS LINE HERE
f.close()

这是旧版Python的最佳通用形式。

根据史蒂夫的建议，使用“with”语句现在被认为是最佳做法。如果你使用2.6或更高版本，你应该这样写：

with open(filename, 'rt') as f:
  for line in f:
    if line.startswith('#'):
      continue
    # PROCESS LINE HERE

“with”语句将为您清理文件句柄。

在你的问题中，你说的是“以＃开头的行”，这就是我在这里向你展示的内容。如果要过滤掉以可选空格和开头的行 a'＃'，则应在查找“＃”之前删除空格。在这种情况下，你应该改变这个：

    if line.startswith('#'):

到此：

    if line.lstrip().startswith('#'):

在Python中，字符串是不可变的，因此不会更改line的值。 lstrip()方法返回字符串的副本，并删除其所有前导空格。

Answer 5

我最近发现生成器功能可以很好地完成这项工作。我使用了类似的功能来跳过注释行，空白行等。

我将我的功能定义为

def skip_comments(file):
    for line in file:
        if not line.strip().startswith('#'):
            yield line

那样，我可以做到

f = open('testfile')
for line in skip_comments(f):
    print line

这可以在我的所有代码中重复使用，我可以添加任何其他处理/日志/等。我需要的。

Answer 6

过滤表达式的更紧凑版本也可以如下所示：

for line in (l for l in open(filename) if not l.startswith('#')):
    # do something with line

(l for ... )被称为“生成器表达式”，它在此处充当包装迭代器，它将在迭代时从文件中过滤掉所有不需要的行。不要将它与方括号[l for ... ]中的相同内容混淆，后者是一个“列表理解”，它将首先将文件中的所有行读入内存，然后才开始迭代它。

有时你可能希望它不那么单一，更具可读性：

lines = open(filename)
lines = (l for l in lines if ... )
# more filters and mappings you might want
for line in lines:
    # do something with line

所有过滤器将在一次迭代中即时执行。

Answer 7

我知道这是一个旧线程，但这是我的生成器函数用于我自己的目的。它会删除评论，无论他们在哪里出现在行中，以及剥离前导/尾随空格和空白行。以下源文本：

# Comment line 1
# Comment line 2

# host01  # This host commented out.
host02  # This host not commented out.
host03
  host04  # Oops! Included leading whitespace in error!

将产生：

host02
host03
host04

以下是文档代码，其中包含演示：

def strip_comments(item, *, token='#'):
    """Generator. Strips comments and whitespace from input lines.

    This generator strips comments, leading/trailing whitespace, and
    blank lines from its input.

    Arguments:
        item (obj):  Object to strip comments from.
        token (str, optional):  Comment delimiter.  Defaults to ``#``.

    Yields:
        str:  Next non-blank line from ``item`` with comments and
            leading/trailing whitespace removed.

    """

    for line in item:
        s = line.split(token, 1)[0].strip()
        if s != '':
            yield s


if __name__ == '__main__':
    HOSTS = """# Comment line 1
    # Comment line 2

    # host01  # This host commented out.
    host02  # This host not commented out.
    host03
      host04  # Oops! Included leading whitespace in error!""".split('\n')


    hosts = strip_comments(HOSTS)
    print('\n'.join(h for h in hosts))

正常用例是从文件中删除注释（即主机文件，如上例所示）。如果是这种情况，则上述代码的尾端将被修改为：

if __name__ == '__main__':
    with open('hosts.txt', 'r') as f:
        hosts = strip_comments(f)

    for host in hosts:
        print('\'%s\'' % host)

Answer 8

使用正则表达式re.compile("^(?:\s+)*#|(?:\s+)")跳过新的行和评论。

Answer 9

我倾向于使用

for line  in lines:
    if '#' not in line:
        #do something

这将忽略整行，虽然包含rpartition的答案有我的upvote，因为它可以包含＃

之前的任何信息

Answer 10

摆脱对内联和在线都有效的评论是一件好事

def clear_coments(f):
    new_text = ''
    for line in f.readlines():
        if "#" in line: line = line.split("#")[0]

        new_text += line

    return new_text

Python：如何在读取文件时忽略#comment行

10 个答案:

更强大的解决方案是使用shlex：

当您将每个输入行拆分为字段（在空白处）时，常见的情况甚至更简单：