我有一个csv文件,我想在一列中获取所有行。我已尝试导入MS Excel或使用Notedpad ++格式化它。但是,每次尝试时,它都会将一段数据视为新行。 如何使用pythons csv模块格式化文件,以便删除字符串“BRAS”并更正格式。每行都在引号“和分隔符是管道|之间找到。 更新:
"aa|bb|cc|dd|
ee|ff"
"ba|bc|bd|be|
bf"
"ca|cb|cd|
ce|cf"
上面应该是3行,但是我的编辑将它们视为5行或6行等等。
import csv
import fileinput
with open('ventoya.csv') as f, open('ventoya2.csv', 'w') as w:
for line in f:
if 'BRAS' not in line:
w.write(line)
N.B尝试在python中使用时出现unicode错误。
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 18: character maps to <undefined>
答案 0 :(得分:1)
这是对小输入文件的快速破解(内容被读取到内存中)。
#!python2
fnameIn = 'ventoya.csv'
fnameOut = 'ventoya2.csv'
with open(fnameIn) as fin, open(fnameOut, 'w') as fout:
data = fin.read() # content of the input file
data = data.replace('\n', '') # make it one line
data = data.replace('""', '|') # split char instead of doubled ""
data = data.replace('"', '') # remove the first and last "
print data
for x in data.split('|'): # split by bar
fout.write(x + '\n') # write to separate lines
或者,如果目标只是修复额外(不需要的)换行符以形成单列CSV文件,则可以先修复该文件,然后通过csv模块读取:
#!python2
import csv
fnameIn = 'ventoya.csv'
fnameFixed = 'ventoyaFixed.csv'
fnameOut = 'ventoya2.csv'
# Fix the input file.
with open(fnameIn) as fin, open(fnameFixed, 'w') as fout:
data = fin.read() # content of the file
data = data.replace('\n', '') # remove the newlines
data = data.replace('""', '"\n"') # add the newlines back between the cells
fout.write(data)
# It is an overkill, but now the fixed file can be read using
# the csv module.
with open(fnameFixed, 'rb') as fin, open(fnameOut, 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for row in reader:
writer.writerow(row)
答案 1 :(得分:0)
要解决此问题,您无需转到代码。 1:在Notepad ++中打开文件 2:在第一行中选择| symble直到下一行 3:用|
替换并替换所选格式搜索模式可以是正常的或扩展的:)
答案 2 :(得分:0)
好吧,因为换行符是一致的,你可以按照建议进入并查找/替换,但你也可以用你的python脚本快速转换:
import csv
import fileinput
linecount = 0
with open('ventoya.csv') as f, open('ventoya2.csv', 'w') as w:
for line in f:
line = line.rstrip()
# remove unwanted breaks by concatenating pairs of rows
if linecount%2 == 0:
line1 = line
else:
full_line = line1 + line
full_line = full_line.replace(' ','')
# remove spaces from front of 2nd half of line
# if you want comma delimiters, uncomment next line:
# full_line = full_line.replace('|',',')
if 'BRAS' not in full_line:
w.write(full_line + '\n')
linecount += 1
这对我来说对测试数据有用,如果你想在写入文件时更改分隔符,你可以。使用代码的好处是:1。您可以使用代码(总是很有趣)和2.您可以同时删除换行符并将内容过滤到写入的文件。