根据行

时间:2017-11-07 19:15:40

标签: python

代码:

with open("filename.txt" 'r') as f: #I'm not sure about reading it as r because I would be removing lines.

    lines = f.readlines() #stores each line in the txt into 'lines'.
    invalid_line_count = 0

    for line in lines: #this iterates through each line of the txt file.
        if line is invalid:

            # something which removes the invalid lines.
            invalid_line_count += 1

    print("There were " + invalid_line_count + " amount of invalid lines.")

我有一个像这样的文本文件:

1,2,3,0,0
2,3,0,1,0
0,0,0,1,2
1,0,3,0,0
3,2,1,0,0

有效行结构是用逗号分隔的5个值。 要使一条线有效,它必须有一个1,2,3和两个0。这些数字在什么位置并不重要。

有效行的示例是1,2,3,0,0

无效行的示例是1,0,3,0,0,因为它不包含2并且有3 0而不是2。

我希望能够遍历文本文件并删除无效行。 也许还有一条消息说“有无数行x。”

或者可能是建议:

当您从原始文件中读取每一行时,请测试其有效性。如果通过,请将其写入新文件。完成后,将原始文件重命名为其他文件,然后将新文件重命名为原始文件。

我认为csv模块可能有帮助,所以我阅读了文档,它对我没有帮助。

有什么想法吗?

4 个答案:

答案 0 :(得分:2)

  1. 您无法从文件中删除行。相反,您必须重写文件,包括仅有效的行。在读完所有数据后关闭文件,以“w”模式重新打开,或者在处理行时写入新文件(短期内占用的内存较少。
  2. 检测线路有效性的主要问题似乎是处理输入。您想要将输入文本转换为值列表;这是学习工具时应该掌握的技能。这里需要的是split来划分线,int来转换值。例如:

    line_vals = line.split(',')

  3. 现在遍历line_vals,并使用int将每个转换为整数。

    1. 有效期:您需要计算此列表中每个值的数量。你应该能够按价值计算事物;如果没有备份到您之前的课程并查看基本逻辑和数据流。如果您需要高级方法,请使用collections.Counter,这是一种方便的字典类型,可累积任何序列的计数。
    2. 这会让你感动吗?如果你还在迷路,我建议你和当地的导师一起来。

答案 1 :(得分:1)

这是一个主要与语言无关的问题。你要做的是打开另一个文件进行写作。当您从原始文件中读取每一行时,请测试它的有效性。如果通过,请将其写入新文件。完成后,将原始文件重命名为其他文件,然后将新文件重命名为原始文件。

答案 2 :(得分:1)

可能的正确方法之一:

with open('filename.txt', 'r+') as f:   # opening file in read/write mode
    inv_lines_cnt = 0
    valid_list = [0, 0, 1, 2, 3]        # sorted list of valid values
    lines = f.read().splitlines()
    f.seek(0)
    f.truncate(0)                       # truncating the initial file

    for l in lines:
        if sorted(map(int, l.split(','))) == valid_list:
            f.write(l+'\n')
        else:
            inv_lines_cnt += 1

print("There were {} amount of invalid lines.".format(inv_lines_cnt))

输出:

There were 2 amount of invalid lines.

最终filename.txt内容:

1,2,3,0,0
2,3,0,1,0
3,2,1,0,0

答案 3 :(得分:0)

  

要使一条线有效,每条线必须有1,2,3和2 0' s。这些数字在什么位置并不重要。

CHUNK_SIZE = 65536


def _is_valid(line):
    """Check if a line is valid.

    A line is valid if it is of length 5 and contains '1', '2', '3',
    in any order, as well as '0', twice.

    :param list line: The line to check.
    :return: True if the line is valid, else False.
    :rtype: bool
    """
    if len(line) != 5:
        # If there's not exactly five elements in the line, return false
        return False
    if all(x in line for x in {"1", "2", "3"}) and line.count("0") == 2:
        # Builtin `all` checks if a condition (in this case `x in line`)
        # applies to all elements of a certain iterator.
        # `list.count` returns the amount of times a specific
        # element appears in it. If "0" appears exactly twice in the line
        # and the `all` call returns True, the line is valid.
        return True
    # If the previous block doesn't execute, the line isn't valid.
    return False


def get_valid_lines(path):
    """Get the valid lines from a file.

    The valid lines will be written to `path`.

    :param str path: The path to the file.
    :return: None
    :rtype: None
    """
    invalid_lines = 0
    contents = []
    valid_lines = []
    with open(path, "r") as f:
        # Open the `path` parameter in reading mode.
        while True:
            chunk = f.read(CHUNK_SIZE)
            # Read `CHUNK_SIZE` bytes (65536) from the file.
            if not chunk:
                # Reaching the end of the file, we get an EOF.
                break
            contents.append(chunk)
            # If the chunk is not empty, add it to the contents.
    contents = "".join(contents).split("\n")
    # `contents` will be split in chunks of size 65536. We need to join
    # them using `str.join`. We then split all of this by newlines, to get
    # each individual line.
    for line in contents:
        if not _is_valid(line=line):
            invalid_lines += 1
        else:
            valid_lines.append(line)
    print("Found {} invalid lines".format(invalid_lines))
    with open(path, "w") as f:
        for line in valid_lines:
            f.write(line)
            f.write("\n")

我将其拆分为两个函数,一个用于根据您的规则检查一行是否有效,另一个用于操作文件。如果您想要返回有效行,只需删除第二个with语句并将其替换为return valid_lines