我有一个大文件〜9GB,每行都有这种格式:
12345,6789,Jim Bob
我想要的输出是:
12345,6789,Jim,Bob
如何使用awk
执行此操作。这似乎是处理这个问题的最快方法,在使用终端进行此类工作时,我是新手。谢谢!
答案 0 :(得分:2)
使用awk和regex用逗号代替第一个空格:
$ awk '{sub(/ /,",")}1' file
12345,6789,Jim,Bob
或使用awk和regex在第三个字段($3
)中用逗号替换空格:
$ awk 'BEGIN{FS=OFS=","}{sub(/ /,",",$3)}1' file
12345,6789,Jim,Bob
答案 1 :(得分:2)
使用<body bgcolor="#87CEFA">
<div id="container">
<div id="header">
<div id="logo">
<h1> <a href="index.html">New Eve Inc.</a></h1>
</div>
<nav class="menu">
<ul class="clearfix">
<li>
<a href="#">Menu <span class="arrow">▼</span></a>
<ul class="sub-menu">
<li><a href="index.html">Home</a><br>
<a href="aboutNewEve.html">About</a><br>
<a href="bagSealing.html">Bag Sealing</a><br>
<a href="clubPackaging.html">Club Packaging</a><br>
<a href="displays.html">Displays</a><br>
<a href="wareDist.html">Warehousing & Distribution</a><br>
<a href="contact.html">Contact Us</a>
</li>
</ul>
</li>
</ul>
</nav>
</div>
<div id="homeBody">
<h1>New Eve Inc.</h1>
<h2>Your Partner in Fulfillment Every Step of the Way</h2>
<p>New Eve has the knowledge and expertise to provide you with the most cost-effective packaging, warehousing, and frieght solutions for your product. We understand that your product is unique, with its own market identity, packaging needs, and delivery
requirements. This is why we offer solutions that will bring your product to market quickly and more profitably.
</p>
<img src="images/high view.jpg" alt="Building">
</div>
<div id="footer">
<p> New Eve Inc. <br> 100 Enterprise Drive Carbondale PA, 18407</p>
</div>
</div>
</body>
</html>
awk
你明白了,
awk '$1=$1' OFS=, file
答案 2 :(得分:2)
考虑到输入文件的大小,我觉得sed
对你的要求会快得多:
sed -E 's/ ([^ ]+)$/,\1/' file > file.modified
或者,对于就地编辑:
sed -i.bak -E 's/ ([^ ]+)$/,\1/' file
使用36 MB文件进行基准测试,dummy.txt:
$ time awk 'BEGIN{FS=OFS=","}{sub(/ /,",",$3)}1' dummy.txt >/dev/null
real 0m3.357s
user 0m3.337s
sys 0m0.016s
$ time awk '{sub(/ /,",")}1' dummy.txt >/dev/null
real 0m3.182s
user 0m3.166s
sys 0m0.014s
$ time awk '$1=$1' OFS=, dummy.txt >/dev/null
real 0m3.150s
user 0m3.130s
sys 0m0.018s
$ time sed -E 's/ ([^ ]+)$/,\1/' dummy.txt >/dev/null
real 0m1.646s
user 0m1.633s
sys 0m0.013s
sed
比awk
快2倍!对于9G文件,这种差异可能会更加显着。
答案 3 :(得分:0)
你可以使用&#39; tr &#39;如果这适合你
<强> tr -s ' ' ',' < file.txt > tr.txt
强>
其中 file.txt 是您的输入文件 tr.txt 是输出文件。
如果你只想使用awk,你可以选择空格作为字段分隔符,并使用awk来打印&#39;,&#39;在两列之间
awk -F' ' '{print $1","$2}' file.txt
为283Mb文件完成基准测试
使用tr
time tr -s ' ' ',' < file.txt >tr.txt
real 0m10.976s
user 0m1.042s
sys 0m0.966s
使用awk
time awk -F' ' '{print $1","$2}' file.txt > /dev/null
real 0m14.141s
user 0m13.909s
sys 0m0.199s
使用@codeforester方法
time sed -E 's/ ([^ ]+)$/,\1/' file.txt >/dev/null
real 0m42.183s
user 0m41.659s
sys 0m0.435s
tr比sed和awk工作得更快