多行选项卡分隔文件以逗号分隔

时间:2014-04-22 12:13:48

标签: bash csv vim

我有大文件,分隔标签。最大的问题是我需要将数据导入数据库,但有些列是多行的,这会导致一些问题。我想用bash将文件转换为正确的逗号分隔文件。 这是文件的示例(我将使用pipe |替换选项卡):

1|Some text|another text|12| Some big big big

text with lots of data and multiple lines

and commas|34|34
2|Some text|another text||Another big big big big
text with lots of characters like , and tab|33|25

在上面的例子中,基本上有两行数据。我想拥有的是:

"1","Some text","another text","12"," Some big big big

text with lots of data  and multiple lines

and commas","34","34"
"2","Some text","another text","","Another big big big big
text with lots of characters like , and tab","33","25"

在vim中,我可以看到每个完整的数据行(带有多行列)都被^ M $终止,所以它看起来像这样:

1|Some text|another text|12| Some big big big

text with lots of data and multiple lines

and commas|34|34^M$
2|Some text|another text||Another big big big big
text with lots of characters like , and tab|33|25^M$

1 个答案:

答案 0 :(得分:0)

这非常棘手,它取决于执行正确的替换顺序。以下似乎有效(至少在你给出的例子中):

" Enclose non-multiline lines with quotes.
:g/\t/s/^\|\(\r\)$/"\1/g
" Undo the ending quote before / the beginning quote after a multiline.
:v/\t/-1s/"$//
:v/\t/+1s/^"//
" Undo the beginning quote after an incomplete (i.e. no ^M) previous record.
:g/\t/-1s/\r\@<!\n\zs"//
" Replace tabs with quotes and commas.
:%s/\t/","/g
" Finally, remove the ^M end-of-record marker.
:%s/\r$//