需要使用unix(solaris)从文件中有选择地删除换行符

时间:2013-02-12 14:28:00

标签: parsing unix

我正在尝试找到一种从文件中有选择地删除换行符的方法。我没有问题删除所有这些..但我需要留下一些。

以下是错误输入文件的示例。请注意,具有Permit ID COO789& COO012在我需要删除的描述字段中嵌入了换行符。

"Permit Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians
Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race 
weekend",,"05/11/2013","05/11/2013"

以下是我需要文件的示例:

"Permit Number/Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race weekend",,"05/11/2013","05/11/2013"

注意:我通过删除一些额外的列来简化文件。逻辑应该能够容纳任意数量的列。实际的完整标题行是所有列的。从技术上讲,我希望在“描述”和“位置”列中找到“额外”换行符。

"Permit Number/Id","Permit Name","Description","Start Date","End Date","Custom Status","Owner Name","Total Expected Attendance","Location"

我尝试过sed,cut,tr,nawk等。打开任何可以执行此操作的解决方案..可以在unix脚本中调用。

感谢!!!

2 个答案:

答案 0 :(得分:1)

如果必须仅在“描述”和“位置”字段中删除换行符,则需要一个正确的csv解析器(想想Text :: CSV)。您也可以使用GNU awk相当容易地完成此操作,但遗憾的是,您无法访问Solaris上的gawk。因此,下一个最佳解决方案是将不以双引号开头的行连接到上一行。您可以使用sed执行此操作。我写这篇文章时考虑到兼容性:

sed -e :a -e '$!N; s/ *\n\([^"]\)/ \1/; ta' -e 'P;D' file

结果:

"Permit Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race weekend",,"05/11/2013","05/11/2013"

答案 1 :(得分:0)

sed ':a;N;$!ba;s/ \n/ /g'

将整个文件读入模式空间,然后删除在空格后直接出现的所有换行符 - 假设所有错误的换行符合此模式。如果没有,什么时候应该删除新行?