用于更改/修改CSV分隔符和分隔符的sed语句

时间:2013-07-24 05:17:51

标签: bash shell csv sed

我有一些CSV文件,其中包含逗号分隔值,而某些列值可以包含,.<>!/\;&等字符

我正在尝试将CS​​V转换为逗号分隔,引用附带的CSV

示例数据:

DateCreated,DateModified,SKU,Name,Category,Description,Url,OriginalUrl,Image,Image50,Image100,Image120,Image200,Image300,Image400,Price,Brand,ModelNumber
2012-10-19 10:52:50,2013-06-11 02:07:16,34,Austral Foldaway 45 Rotary Clothesline,Home & Garden > Household Supplies > Laundry Supplies > Drying Racks & Hangers,"Watch the Product Video            Plenty of Space to Hang a Family Wash  Austral's Foldaway 45 rotary clothesline is a folding head rotary clothes hoist beautifully finished in either Beige or Heritage Green.  Even though the Foldaway 45 is compact, you still get a large 45 metres of line space, big enough for a full family wash.  If you want the advantage of a rotary hoist, but dont want to lose your yard, then the Austral Foldaway 45 is the clothesline for you.&nbsp;  Installation Note:&nbsp;A core hole is only required when installing into existing concrete, e.g. a pathway. Not required in the ground(grass/soil).  To watch video on YouTube, click the following link:&nbsp;Austral Foldaway 45 Rotary Clothesline      &nbsp;            //           Customer Video Reviews  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;",https://track.commissionfactory.com.au/p/10604/1718695,http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/,http://content.commissionfactory.com.au/Products/7228/1718695.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@50x50.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@100x100.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@120x120.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@200x200.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@300x300.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@400x400.jpg,309.9000 AUD,Austral,FA45GR

我想要实现的输出是

"DateCreated","DateModified","SKU","Name","Category","Description","Url","OriginalUrl","Image","Image50","Image100","Image120","Image200","Image300","Image400","Price","Brand","ModelNumber"
"2012-10-19 10:52:50","2013-06-11 02:07:16","34","Austral Foldaway 45 Rotary Clothesline","Home & Garden > Household Supplies > Laundry Supplies > Drying Racks & Hangers","Watch the Product Video            Plenty of Space to Hang a Family Wash  Austral's Foldaway 45 rotary clothesline is a folding head rotary clothes hoist beautifully finished in either Beige or Heritage Green.  Even though the Foldaway 45 is compact, you still get a large 45 metres of line space, big enough for a full family wash.  If you want the advantage of a rotary hoist, but dont want to lose your yard, then the Austral Foldaway 45 is the clothesline for you.&nbsp;  Installation Note:&nbsp;A core hole is only required when installing into existing concrete, e.g. a pathway. Not required in the ground(grass/soil).  To watch video on YouTube, click the following link:&nbsp;Austral Foldaway 45 Rotary Clothesline      &nbsp;            //           Customer Video Reviews  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;","https://track.commissionfactory.com.au/p/10604/1718695","http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/","http://content.commissionfactory.com.au/Products/7228/1718695.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@50x50.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@100x100.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@120x120.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@200x200.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@300x300.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@400x400.jpg","309.9000 AUD","Austral","FA45GR"

非常感谢任何帮助。

3 个答案:

答案 0 :(得分:3)

首先,让我们尝试一些简单(和“不够好”)的解决方案,只为每个字段添加一个双引号(包括已经有双引号的那些!这不是你想要的)

sed -r 's/([^,]*)/"\1"/g'

很好,第一部分查找没有逗号的序列,第二部分在它们周围添加双引号,最后的'g'表示每行不止一次

这将转为

abc,345, some words ,"some text","text,with,commas"

成     “abc”,“345”,“some words”,“”some text“”,“”text“,”with“,”逗号“”

有几点需要注意:

  • 它正确地围绕着“一些单词”,它们之间有空格,但也围绕着初始和最终空间。我认为没关系,但如果没有,可以修复

  • 如果该字段已有引号,则会再次引用该字段,这是不好的。需要修复

  • 如果字段已经有引号并且内部文本有逗号(不应该被视为字段分隔符),则也会引用这些逗号。这也需要修复

所以我们要匹配两个不同的正则表达式 - 要么是带引号的字符串,要么是没有逗号的字段:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g'

结果现在是

"abc","345"," some words ",""some text"",""text,with,commas""

如您所见,我们对最初引用的文字有双引号。我们必须使用第二个sed命令删除它:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g' | sed 's/""/"/g'

结果是

"abc","345"," some words ","some text","text,with,commas"

YAY!

答案 1 :(得分:0)

听起来你希望文件中的每一行都以双引号开头和结尾。如果是这样,这应该有效:

sed -i.bak 's/^\(.*\)$/"\1"/' filename

答案 2 :(得分:0)

试试这个解决方案。它优于我以前的,因为现在我使用的解析器正确处理字段内的逗号。工作模块Text::CSV_XS是必要的:

#!/usr/bin/env perl

use strict;
use warnings;
use Text::CSV_XS;

die qq|Usage: perl $0 <csv-file>\n| unless @ARGV == 1;

open my $fh, '<', shift or die qq|ERROR: Could not open input file\n|;

my $csv = Text::CSV_XS->new( {
        always_quote => 1,
} );

while ( my $row = $csv->getline( $fh ) ) { 
        $csv->print( *STDOUT, $row );
        print "\n";
}
$csv->eof;
close $fh;