Question

我有一个相对较大的csv / text数据文件（33mb），我需要进行全局搜索并替换分隔字符。（原因是在表导出期间似乎没有办法让SQLServer在数据中转义/处理双引号，但这是另一个故事......）

我成功完成了一个Textmate搜索并替换了一个较小的文件，但它在这个较大的文件上窒息。

看起来命令行grep可能就是答案，但我无法理解语法，ala：

grep -rl OLDSTRING . | xargs perl -pi~ -e ‘s/OLDSTRING/NEWSTRING/’

所以在我的情况下，我正在搜索'^'（插入符号）字符并替换为'''（双引号）。

grep -rl " grep_test.txt | xargs perl -pi~ -e 's/"/^'

这不起作用，我假设它与双引号的转义有关，但我很丢失。帮助任何人？

（我想如果有人知道如何让SQLServer2005在导出到csv期间处理文本列中的双引号，那真的可以解决核心问题。）

Answer 1

你的perl替换似乎是错误的。尝试：

grep -rl \" . | xargs perl -pi~ -e 's/\^/"/g'

说明：

grep : command to find matches
-r : to recursively search
-l : to print only the file names where match is found
\" : we need to escape " as its a shell meta char
. : do the search in current working dir
perl : used here to do the inplace replacement
-i~ : to do the replacement inplace and create a backup file with extension ~
-p : to print each line after replacement
-e : one line program
\^ : we need to escape caret as its a regex meta char to mean start anchor

Answer 2

sed -i.bak 's/\^/"/g' mylargefile.csv

更新：你也可以使用Perl作为rein建议

perl -i.bak -pe 's/\^/"/g' mylargefile.csv

但是在大文件上，sed的运行速度可能比Perl快一些，因为我的结果显示在600万行文件上

$ tail -4 file
this is a line with ^
this is a line with ^
this is a line with ^

$ wc -l<file
6136650

$ time sed 's/\^/"/g' file  >/dev/null

real    0m14.210s
user    0m12.986s
sys     0m0.323s
$ time perl  -pe 's/\^/"/g' file >/dev/null

real    0m23.993s
user    0m22.608s
sys     0m0.630s
$ time sed 's/\^/"/g' file  >/dev/null

real    0m13.598s
user    0m12.680s
sys     0m0.362s

$ time perl  -pe 's/\^/"/g' file >/dev/null

real    0m23.690s
user    0m22.502s
sys     0m0.393s

如何有效地搜索/替换大文本文件？

2 个答案: