我有这个csv文件,并且我注意到在导出过程中并没有添加起始报价。实际上,如果我输入以下内容,在ubuntu中:
head -n 1 file.csv
我得到以下输出:
801","40116","Hazelnut MT -L","Thursday Promo","Large","","5.9000","","801","1.0000","","3.6500","2.2500",".0000","default","","","","","Chatime","02/06/2014","09125a9cfffd4143a00e73e3b62f15f2","CB01","",".0000","5.9000","6.9000",".0000",".0000",".0000",".0000",".0000",".0000","0","","0","0","0","","","","","","","","","Modern Milk Tea","","","0","","","1","0","","","","","","","","0","Hau Chan","","","","","","","","","","0","","","","","","","-1","","","","","","","","","","","","0","00000000420714AA","2014-06-02","1900-01-01","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
是否有一些命令类型可以帮助我添加缺少的起始引号?
答案 0 :(得分:4)
这应该在每个posix-shell中都有效:
printf \" | cat - file.csv > repaired-file.csv
如果您对结果满意,可以覆盖原始图片
mv repaired-file.csv file.csv
由于文件的大小为70GB,因此您可能要避免创建第二个文件,但这比看起来要难。当然,有些东西是sed
的就地选项(-i
)和sponge
中的moreutils
实用程序,但是它们不能像您期望的那样就地工作。 sed -i
和sponge
都使用临时文件或将整个文件保存在内存中(这不再适用于70GB)。在this blog post中可以找到有关真正的就地编辑的大量研究。结论:没有标准工具支持真正的就地编辑。但是下面的perl
一线应该可以(已经适应您的需求)。
perl <<'EOF'
use Tie::File;
my @a;
tie @a, 'Tie::File', 'path/to/your/file' or die 'Cannot tie file';
$a[0] = '"' . $a[0];
EOF
出于兴趣,我运行了此处讨论的命令并测量了它们的运行时间。
使用f
生成了9.3 GiB输入文件seq 1000000000 > f
。在计时单个命令之前,我总是重新生成f
并使用sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
清空系统缓存。我的系统有足够的内存来容纳整个文件,但是我手动监视了内存的使用情况-所有命令仅使用了几KB的内存。
printf \" | cat - f > f2; mv f2 f
1m 05s perl … # script from above
m 1m 32s sed -i '1s/^/"/' f
25m 57s(一直使用100%CPU)我对cat
命令比perl
脚本更快感到惊讶。但是,这是有道理的,因为perl
脚本会执行很多搜索(可以使用strace
看到),而cat
只是复制。
摘要::如果还有足够的磁盘空间,请使用cat
命令。如果文件大于系统上剩余的可用磁盘空间,请使用perl
脚本。