任何人都可以提供sed或awk方法来删除csv文件的最后两列吗?

时间:2014-07-09 21:50:03

标签: awk sed

修改大家好,谢谢你的回复。 我的问题不是如何解决我在这里提供的sample.csv,情况是我有100多个类似的文件,我希望我能快速有效地解决它们,我通过python解决了这个问题,但我更喜欢sed,因为我知道sed可以直接修改文件。我不想数百次运行类似的命令......

我每天生成大约4个月生成的文件,每个文件包含9列,现在我想从所有这些文件中删除最后两列

我打算使用sed删除带有-i的最后两列,我的目的是我可以直接修改所有文件,而不需要写入新文件。不幸的是,我找不到办法,然后我编写了我的python脚本来完成所有的工作。这是我的代码:

    def remove_last_two_columns(input_dir, output_dir, file_name):
    writer = open(output_dir + file_name, "w")
    with open(input_dir + file_name, "r") as inputs:
        for line in inputs:
            parts = line.strip().split(",")
            outline = ""
            for index, part in enumerate(parts):
                if index < 7:
                    outline += "," + part

            writer.write(outline[1:] + "\n")
    writer.close()

remove_last_two_columns("/home/haifzhan/input/", "/home/haifzhan/output/", "sample.csv") 

输入:

C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N,2014-07-01 00:37:22,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N,2014-07-01 00:25:55,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13,N,N
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24,N,N
C1,C2,2014-06-30 18:41:27,N,N,N,N,2014-07-01 00:52:22,N



my output:
C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24
C1,C2,2014-06-30 18:41:27,N,N,N,N

任何人都可以提供sed / awk方式来实现这一目标吗?我想在将来的工作中使用sed / awk。提前谢谢。

3 个答案:

答案 0 :(得分:3)

Awk解决方案

awk 'BEGIN{FS=OFS=","}NF=(NF-2)' file

答案 1 :(得分:2)

cut绝对是实现这一目标的最简单工具:

cat input | cut -d, -f8,9 --complement

请注意,切割的osx版本已过时,因此最好获取最新版本:

brew install coreutils

答案 2 :(得分:2)

此语句删除最后两列,其中sample.csv是输入文件的名称。

sed s/,[^,]*,[^,]*$//g sample.csv

我的结果是:

C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24
C1,C2,2014-06-30 18:41:27,N,N,N,N

在您的示例中,您删除了最后3列,您可以通过将原始语句修改为以下内容来执行此操作:

sed s/,[^,]*,[^,]*,[^,]*$//g sample.csv