如何从文件中所有引用的文本中删除换行符?

时间:2011-11-21 22:36:24

标签: python bash command-line csv text-processing

我已从数据库导出CSV文件。某些字段是较长的文本块,可以包含换行符。从这个文件中仅删除双引号内的换行符,但保留所有其他换行符的最简单方法是什么?

我不在乎它是否使用Bash命令行一个衬里或简单的脚本,只要它有效。

例如,

"Value1", "Value2", "This is a longer piece
    of text with
    newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"

应该删除较长文本中的换行符,但不要删除分隔两行的换行符。

3 个答案:

答案 0 :(得分:7)

这是Python的解决方案:

import re
pattern = re.compile(r'".*?"', re.DOTALL)
print pattern.sub(lambda x: x.group().replace('\n', ''), text)

查看在线工作:ideone

答案 1 :(得分:7)

在Python中:

import csv
with open("input.csv", "rb") as input, open("output.csv", "wb") as output:
    w = csv.writer(output)
    for record in csv.reader(input):
        w.writerow(tuple(s.remove("\n") for s in record))

答案 2 :(得分:2)

这非常简单,但可能适合您:

# cat <<\! | sed ':a;/"$/{P;D};N;s/\n//g;ba'                            
> "Value1", "Value2", "This is a longer piece
>     of text with
>     newlines in it.", "Value3"
> "Value4", "Value5", "Another value", "value6"
> !
"Value1", "Value2", "This is a longer piece    of text with    newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"