我有一个基于天气数据的奇怪格式的数据集,我怀疑这是SED的工作。 数据作为条目之间的空格或具有2个带标识符的空格。我正在尝试编写一个shell脚本来获取此数据并将其转换为CSV文件。我使用AWK将空格替换为逗号,但后来我意识到有不同数量的空格,因为有人决定做一些棘手的事情。举个例子,我有一个子集:
USH00011084 1897 734 3 1292 3 1972 3 1786 3 2084 3 2761 3 2753 3 2547 3 2406 3 1878 3 -9999 -9999
USH00011084 1900 -9999 -9999 1337a 3 1936 3 2378 3 2589 3 2770 3 2872 3 2700 3 2320 3 1486 3 1100 3
USH00011084 1926 -9999 1245 1251a 1781 2240 2654 2712 2763c 2770 2110 1256a 1421
USH00011084 1927 1209 1821 1651 2183 2467 2707 2730 2594a 2579 2081 1907 871f 3
USH00011084 1928 800b 1135 1614 1711 2218 2596 2829 2817 -9999 -9999 -9999 -9999
我认为如果有少于5个空格我可以使用SED来设置逗号,如果有5个空格则可以使用两个逗号;但是,我还没弄明白。任何建议将不胜感激。
答案 0 :(得分:4)
我会说...
sed -e 's/ /,,/g' -e 's/ \+/,/g' file
或者,有点清洁:
sed -re 's/ {5}/,,/g' -e 's/ +/,/g' file
两者都产生:
USH00011084,1897,734,3,1292,3,1972,3,1786,3,2084,3,2761,3,2753,3,2547,3,2406,3,1878,3,-9999,-9999,
USH00011084,1900,-9999,-9999,,1337a,3,1936,3,2378,3,2589,3,2770,3,2872,3,2700,3,2320,3,1486,3,1100,3
USH00011084,1926,-9999,,1245,,1251a,1781,,2240,,2654,,2712,,2763c,2770,,2110,,1256a,1421,
USH00011084,1927,1209,,1821,,1651,,2183,,2467,,2707,,2730,,2594a,2579,,2081,,1907,,,871f,3
USH00011084,1928,800b,1135,,1614,,1711,,2218,,2596,,2829,,2817,-9999,-9999,-9999,-9999,
逻辑是你提到的:
,,
替换5个空格。,
替换一个或多个空格(第一次替换后仍出现的空格)。答案 1 :(得分:1)
如果您不想依赖输入文件中的空格数,那么您可以使用此awk
命令:
awk -v OFS=, '{$1=$1} 1' file
USH00011084,1897,734,3,1292,3,1972,3,1786,3,2084,3,2761,3,2753,3,2547,3,2406,3,1878,3,-9999,-9999
USH00011084,1900,-9999,-9999,1337a,3,1936,3,2378,3,2589,3,2770,3,2872,3,2700,3,2320,3,1486,3,1100,3
USH00011084,1926,-9999,1245,1251a,1781,2240,2654,2712,2763c,2770,2110,1256a,1421
USH00011084,1927,1209,1821,1651,2183,2467,2707,2730,2594a,2579,2081,1907,871f,3
USH00011084,1928,800b,1135,1614,1711,2218,2596,2829,2817,-9999,-9999,-9999,-9999
awk命令分解:
-F '[[:blank:]]+' # use one of more whitespace a input field separator
-v OFS=, # use comma as output field separator
{$1=$1} # force awk to restructure each record