我有大文件,分隔标签。最大的问题是我需要将数据导入数据库,但有些列是多行的,这会导致一些问题。我想用bash将文件转换为正确的逗号分隔文件。 这是文件的示例(我将使用pipe |替换选项卡):
1|Some text|another text|12| Some big big big
text with lots of data and multiple lines
and commas|34|34
2|Some text|another text||Another big big big big
text with lots of characters like , and tab|33|25
在上面的例子中,基本上有两行数据。我想拥有的是:
"1","Some text","another text","12"," Some big big big
text with lots of data and multiple lines
and commas","34","34"
"2","Some text","another text","","Another big big big big
text with lots of characters like , and tab","33","25"
在vim中,我可以看到每个完整的数据行(带有多行列)都被^ M $终止,所以它看起来像这样:
1|Some text|another text|12| Some big big big
text with lots of data and multiple lines
and commas|34|34^M$
2|Some text|another text||Another big big big big
text with lots of characters like , and tab|33|25^M$
答案 0 :(得分:0)
这非常棘手,它取决于执行正确的替换顺序。以下似乎有效(至少在你给出的例子中):
" Enclose non-multiline lines with quotes.
:g/\t/s/^\|\(\r\)$/"\1/g
" Undo the ending quote before / the beginning quote after a multiline.
:v/\t/-1s/"$//
:v/\t/+1s/^"//
" Undo the beginning quote after an incomplete (i.e. no ^M) previous record.
:g/\t/-1s/\r\@<!\n\zs"//
" Replace tabs with quotes and commas.
:%s/\t/","/g
" Finally, remove the ^M end-of-record marker.
:%s/\r$//