我正在尝试获取此输出,我不知道如何通过互联网进行搜索,但是我不知道搜索的确切关键字是什么,因此我将其发布在这里
我有一个csv文件data.csv
,其内容如下所示
到目前为止,我已经尝试过显示我的MWE
cat data.csv|sed 's/\n.*//g'
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,
line 5 text
10,1,6,"<J>
line 6 text"
10,1,7,"line 7 text"
10,1,8,"
line 8 text"
10,1,9,"line 9 text"
我想要如下所示的输出
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"
答案 0 :(得分:1)
除了赛勒斯的答案之外,要确保'line 5 text'
被双引号包围,您可以添加其他表达式以用', '
和行替换',"'
不以'"'
结尾的'"'
,例如
sed -e '/".*"$/!{N;s/\n *//}' -e 's/, /,"/' -e '/"$/!{s/$/"/}' file
第一个表达式完全相同。这将提供您请求的输出:
$ sed -e '/".*"$/!{N;s/\n *//}' -e 's/, /,"/' -e '/"$/!{s/$/"/}' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"
答案 1 :(得分:1)
使用用于多字符RS,RT和gensub()的GNU awk,您可以将每条记录描述为一系列以逗号分隔的4个以换行符结尾的字段,然后删除它们之间的换行符和空格:
$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT)} 1' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,line 5 text
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"
,并确保在最后一个字段周围加上引号:
$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT); $0=gensub(/,([^",]*)$/,",\"\\1\"",1)} 1' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"
请注意,无论您的第四个字段分成几行,它都将起作用:
$ cat file
10,1,1,"line 1 text"
10,1,2,
foo
line
2
text
bar
10,1,3,"line 3 text"
$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT); $0=gensub(/,([^",]*)$/,",\"\\1\"",1)} 1' file
10,1,1,"line 1 text"
10,1,2,"fooline2textbar"
10,1,3,"line 3 text"
答案 2 :(得分:0)
使用GNU sed:
sed '/".*"$/!{N;s/\n *//}' file
如果一行与正则表达式".*"$
不匹配,则将下一行(N
)附加到sed的模式空间,并替换新行,然后再替换为无,一个或多个空白(s/\n *//
)
输出:
10,1,1,"line 1 text" 10,1,2,"line 2 text" 10,1,3,"line 3 text" 10,1,4,"line 4 text" 10,1,5, line 5 text 10,1,6,"line 6 text" 10,1,7,"line 7 text" 10,1,8,"line 8 text" 10,1,9,"line 9 text"
我没有在第5行中添加缺少的引号。
请参阅:man sed
和The Stack Overflow Regular Expressions FAQ