通过命令行在csv文件的第一行添加双引号

时间:2019-05-18 10:12:45

标签: regex bash shell command-line

我有这个csv文件,并且我注意到在导出过程中并没有添加起始报价。实际上,如果我输入以下内容,在ubuntu中:

head -n 1 file.csv

我得到以下输出:

801","40116","Hazelnut MT -L","Thursday Promo","Large","","5.9000","","801","1.0000","","3.6500","2.2500",".0000","default","","","","","Chatime","02/06/2014","09125a9cfffd4143a00e73e3b62f15f2","CB01","",".0000","5.9000","6.9000",".0000",".0000",".0000",".0000",".0000",".0000","0","","0","0","0","","","","","","","","","Modern Milk Tea","","","0","","","1","0","","","","","","","","0","Hau Chan","","","","","","","","","","0","","","","","","","-1","","","","","","","","","","","","0","00000000420714AA","2014-06-02","1900-01-01","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""

是否有一些命令类型可以帮助我添加缺少的起始引号?

1 个答案:

答案 0 :(得分:4)

这应该在每个posix-shell中都有效:

printf \" | cat - file.csv > repaired-file.csv

如果您对结果满意,可以覆盖原始图片

mv repaired-file.csv file.csv

由于文件的大小为70GB,因此您可能要避免创建第二个文件,但这比看起来要难。当然,有些东西是sed的就地选项(-i)和sponge中的moreutils实用程序,但是它们不能像您期望的那样就地工作。 sed -isponge都使用临时文件或将整个文件保存在内存中(这不再适用于70GB)。在this blog post中可以找到有关真正的就地编辑的大量研究。结论:没有标准工具支持真正的就地编辑。但是下面的perl一线应该可以(已经适应您的需求)。

perl <<'EOF'
  use Tie::File;
  my @a;
  tie @a, 'Tie::File', 'path/to/your/file' or die 'Cannot tie file';
  $a[0] = '"' . $a[0];
EOF

基准

出于兴趣,我运行了此处讨论的命令并测量了它们的运行时间。

使用f生成了9.3 GiB输入文件seq 1000000000 > f。在计时单个命令之前,我总是重新生成f并使用sync && echo 3 | sudo tee /proc/sys/vm/drop_caches清空系统缓存。我的系统有足够的内存来容纳整个文件,但是我手动监视了内存的使用情况-所有命令仅使用了几KB的内存。

  • printf \" | cat - f > f2; mv f2 f 1m 05s
  • perl … # script from above m 1m 32s
  • sed -i '1s/^/"/' f 25m 57s(一直使用100%CPU)

我对cat命令比perl脚本更快感到惊讶。但是,这是有道理的,因为perl脚本会执行很多搜索(可以使用strace看到),而cat只是复制。

摘要::如果还有足够的磁盘空间,请使用cat命令。如果文件大于系统上剩余的可用磁盘空间,请使用perl脚本。