我有一个制表符分隔的文本文件,其中可能包含一些包含换行符的值,如下所示:
col1 col2 col3
row1 val1 "Some text
containing newlines. Yup, possibly
more than one..." val3
row2 val4 "val5" val6
注意:
我正在尝试使用re
编写一个小型Python脚本,以便:
以这样的形式出现它会很棒:
def normalize_format(data, delimiter = '\t'):
data = re.sub(_DESIRED_REGEX_, r'"\1"', data)
return data
其中data
是整个文件内容为单个字符串,_DESIRED_REGEX_
是我想要想出的
re
的使用不是强制性的,但赞赏简短而优雅的解决方案:)
答案 0 :(得分:2)
您应该使用csv
module代替:
import csv
with open("mycsv.csv", "rb") as infile, open("newcsv.csv", "wb") as outfile:
reader = csv.reader(infile, delimiter="\t")
writer = csv.writer(outfile, delimiter="\t", quoting=csv.QUOTE_ALL)
# Now you can remove all the newlines within fields
# and write them back to a new CSV file:
for row in reader:
writer.writerow([field.replace("\n", " ") for field in row])