Question

我有一个c ++程序，它生成许多数据文件，每个文件包含三列。现在，在每个这些数据文件中，可能存在一些异常条目，其中第三列将具有-nan。如何编写脚本以便打开每个数据文件并查找第三列具有nan的所有行并删除所有这些行？是否可以在bash或python中编写可以执行此操作的脚本？例如：

100   0.1    15.8334
100   0.2    16.7895
100   0.3     -nan
100   0.4    15.8543
100   0.5      -nan

在这个文件中，我希望删除第3行和第5行，以便我的文件看起来像

100   0.1    15.8334
100   0.2    16.7895
100   0.4    15.8543

Answer 1

像（在bash中）：

for file in files ;do
  grep -v -- -nan file > file.$$ && mv file.$$ file
done

应该尽可能在代码中清理它。

Answer 2

sed -i -e '/-nan/d' datafile.txt

要对多个文件进行操作，可以将“datafile.txt”替换为与所有文件匹配的glob，或者使用for循环

for file in data1.txt data2.txt data3.txt; do
    sed -i -e '/-nan/d' $file
done

或者find命令：

find . -name "data*.txt" -exec sed -i -e '/-nan/d' {} +

Answer 3

这是基本机制：

with open('yourfile.txt') as fin, open('yourfile_output.txt', 'w') as fout:
    for line in fin:
        try:
            c1, c2, c3 = line.split()
            if c3 != '-nan':
                fout.write(line)
        except ValueError as e:
            pass # Handle cases where number of cols != 3

然后把它放在一个函数中并使用glob.iglob来重新列出匹配的文件名列表并循环...

另一个可能的选择只是为了完整性：

from math import isnan

with open('yourfile.txt') as fin, open('yourfile_output.txt', 'w') as fout:
    for line in fin:
        try:
            c1, c2, c3 = map(float, line.split())
            if not isnan(c3):
                fout.write(line)
        except ValueError as e:
            pass # Handle cases where number of cols != 3

用于编辑许多文本文件的脚本

3 个答案: