代码:
with open("filename.txt" 'r') as f: #I'm not sure about reading it as r because I would be removing lines.
lines = f.readlines() #stores each line in the txt into 'lines'.
invalid_line_count = 0
for line in lines: #this iterates through each line of the txt file.
if line is invalid:
# something which removes the invalid lines.
invalid_line_count += 1
print("There were " + invalid_line_count + " amount of invalid lines.")
我有一个像这样的文本文件:
1,2,3,0,0
2,3,0,1,0
0,0,0,1,2
1,0,3,0,0
3,2,1,0,0
有效行结构是用逗号分隔的5个值。 要使一条线有效,它必须有一个1,2,3和两个0。这些数字在什么位置并不重要。
有效行的示例是1,2,3,0,0
无效行的示例是1,0,3,0,0
,因为它不包含2并且有3 0而不是2。
我希望能够遍历文本文件并删除无效行。 也许还有一条消息说“有无数行x。”
或者可能是建议:
当您从原始文件中读取每一行时,请测试其有效性。如果通过,请将其写入新文件。完成后,将原始文件重命名为其他文件,然后将新文件重命名为原始文件。
我认为csv模块可能有帮助,所以我阅读了文档,它对我没有帮助。
有什么想法吗?
答案 0 :(得分:2)
检测线路有效性的主要问题似乎是处理输入。您想要将输入文本转换为值列表;这是学习工具时应该掌握的技能。这里需要的是split
来划分线,int
来转换值。例如:
line_vals = line.split(',')
现在遍历line_vals
,并使用int
将每个转换为整数。
collections.Counter
,这是一种方便的字典类型,可累积任何序列的计数。这会让你感动吗?如果你还在迷路,我建议你和当地的导师一起来。
答案 1 :(得分:1)
这是一个主要与语言无关的问题。你要做的是打开另一个文件进行写作。当您从原始文件中读取每一行时,请测试它的有效性。如果通过,请将其写入新文件。完成后,将原始文件重命名为其他文件,然后将新文件重命名为原始文件。
答案 2 :(得分:1)
可能的正确方法之一:
with open('filename.txt', 'r+') as f: # opening file in read/write mode
inv_lines_cnt = 0
valid_list = [0, 0, 1, 2, 3] # sorted list of valid values
lines = f.read().splitlines()
f.seek(0)
f.truncate(0) # truncating the initial file
for l in lines:
if sorted(map(int, l.split(','))) == valid_list:
f.write(l+'\n')
else:
inv_lines_cnt += 1
print("There were {} amount of invalid lines.".format(inv_lines_cnt))
输出:
There were 2 amount of invalid lines.
最终filename.txt
内容:
1,2,3,0,0
2,3,0,1,0
3,2,1,0,0
答案 3 :(得分:0)
要使一条线有效,每条线必须有1,2,3和2 0' s。这些数字在什么位置并不重要。
CHUNK_SIZE = 65536
def _is_valid(line):
"""Check if a line is valid.
A line is valid if it is of length 5 and contains '1', '2', '3',
in any order, as well as '0', twice.
:param list line: The line to check.
:return: True if the line is valid, else False.
:rtype: bool
"""
if len(line) != 5:
# If there's not exactly five elements in the line, return false
return False
if all(x in line for x in {"1", "2", "3"}) and line.count("0") == 2:
# Builtin `all` checks if a condition (in this case `x in line`)
# applies to all elements of a certain iterator.
# `list.count` returns the amount of times a specific
# element appears in it. If "0" appears exactly twice in the line
# and the `all` call returns True, the line is valid.
return True
# If the previous block doesn't execute, the line isn't valid.
return False
def get_valid_lines(path):
"""Get the valid lines from a file.
The valid lines will be written to `path`.
:param str path: The path to the file.
:return: None
:rtype: None
"""
invalid_lines = 0
contents = []
valid_lines = []
with open(path, "r") as f:
# Open the `path` parameter in reading mode.
while True:
chunk = f.read(CHUNK_SIZE)
# Read `CHUNK_SIZE` bytes (65536) from the file.
if not chunk:
# Reaching the end of the file, we get an EOF.
break
contents.append(chunk)
# If the chunk is not empty, add it to the contents.
contents = "".join(contents).split("\n")
# `contents` will be split in chunks of size 65536. We need to join
# them using `str.join`. We then split all of this by newlines, to get
# each individual line.
for line in contents:
if not _is_valid(line=line):
invalid_lines += 1
else:
valid_lines.append(line)
print("Found {} invalid lines".format(invalid_lines))
with open(path, "w") as f:
for line in valid_lines:
f.write(line)
f.write("\n")
我将其拆分为两个函数,一个用于根据您的规则检查一行是否有效,另一个用于操作文件。如果您想要返回有效行,只需删除第二个with
语句并将其替换为return valid_lines
。