如何使用bash脚本获取欲望输出?

时间:2019-06-02 05:25:44

标签: bash

我正在尝试获取此输出,我不知道如何通过互联网进行搜索,但是我不知道搜索的确切关键字是什么,因此我将其发布在这里 我有一个csv文件data.csv,其内容如下所示 到目前为止,我已经尝试过显示我的MWE

cat data.csv|sed 's/\n.*//g'

10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5, 
line 5 text
10,1,6,"<J>
 line 6 text"
10,1,7,"line 7 text"
10,1,8,"
 line 8 text"
10,1,9,"line 9 text"

我想要如下所示的输出

10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

3 个答案:

答案 0 :(得分:1)

除了赛勒斯的答案之外,要确保'line 5 text'双引号包围,您可以添加其他表达式以用', '和行替换',"'不以'"'结尾的'"',例如

sed -e '/".*"$/!{N;s/\n *//}' -e 's/, /,"/' -e '/"$/!{s/$/"/}' file

第一个表达式完全相同。这将提供您请求的输出:

$ sed -e '/".*"$/!{N;s/\n *//}' -e 's/, /,"/' -e '/"$/!{s/$/"/}' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

答案 1 :(得分:1)

使用用于多字符RS,RT和gensub()的GNU awk,您可以将每条记录描述为一系列以逗号分隔的4个以换行符结尾的字段,然后删除它们之间的换行符和空格:

$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT)} 1' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,line 5 text
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

,并确保在最后一个字段周围加上引号:

$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT); $0=gensub(/,([^",]*)$/,",\"\\1\"",1)} 1' file
10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5,"line 5 text"
10,1,6,"<J>line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

请注意,无论您的第四个字段分成几行,它都将起作用:

$ cat file
10,1,1,"line 1 text"
10,1,2,
foo
line
2
text
bar
10,1,3,"line 3 text"

$ awk -v RS='([^,]*,){3}[^,]*\n' '{$0=gensub(/\s*\n\s*/,"","g",RT); $0=gensub(/,([^",]*)$/,",\"\\1\"",1)} 1' file
10,1,1,"line 1 text"
10,1,2,"fooline2textbar"
10,1,3,"line 3 text"

答案 2 :(得分:0)

使用GNU sed:

sed '/".*"$/!{N;s/\n *//}' file

如果一行与正则表达式".*"$不匹配,则将下一行(N)附加到sed的模式空间,并替换新行,然后再替换为无,一个或多个空白(s/\n *//

输出:

10,1,1,"line 1 text"
10,1,2,"line 2 text"
10,1,3,"line 3 text"
10,1,4,"line 4 text"
10,1,5, line 5 text
10,1,6,"line 6 text"
10,1,7,"line 7 text"
10,1,8,"line 8 text"
10,1,9,"line 9 text"

我没有在第5行中添加缺少的引号。


请参阅:man sedThe Stack Overflow Regular Expressions FAQ