所以我有这个数据集,其中有时会在某些单元格中输入随机换行符,我需要删除它们。
这是我试过的:
with open ('filepath') as inf, open('filepath', 'w') as outf:
for line in inf:
outf.write(line.replace('\n', ''))
不幸的是,这删除了所有换行符,包括行尾的字符,这将我的 csv 文件变成了一个大的单行
有谁知道我如何只能删除随机换行符而不是“真正的”结束符?
编辑:如果有帮助,每个“真正的”新行都以 6 位数字字符串开头(标题行除外)。也许一些正则表达式模式可以提前检测是否有一些数字字符串可以工作?
Edit2:我试过用大熊猫来编辑它:
df = pd.read_csv(filepath)
for i in df.columns:
if df[i].dtype==np.object:
df[i] = df[i].str.replace('\n','')
奇怪的是,如果我将 .csv 中的内容复制到一个新的文本文件中,这会起作用,但它不适用于我的原始 csv 文件,而且我不知道为什么。
最终编辑:
非常感谢 DDS 的帮助。设法让它使用这个工作:
num_cols = 48
buf = ""
with open (filepath) as inf, open (filepath, 'w') as outf:
for line in inf:
if len(line.split(',')) < num_cols:
buf += line.replace('\n', '')
if len(buf.split(',')) == num_cols:
outf.write(buf+'\n')
else: continue
buf = ""
else:
outf.write(line)
答案 0 :(得分:0)
您可以通过多种方式实现这一目标。
with open ('filepath') as inf, open('filepath', 'w') as outf:
for line in inf:
outf.write(line.replace('\n', '') + '\n')
n - 1
作为要替换的换行符数 with open ('filepath') as inf, open('filepath', 'w') as outf:
for line in inf:
outf.write(line.replace('\n', '', line.count('\n') - 1))
import re
result = re.sub( '\n*(?=.*\n)','' ,'ansd\nasdn\naskd\n')
print(result)
'ansdasdnaskd\n'
答案 1 :(得分:0)
首先控制你的行是否为空,然后写行
for line in inf:
if len(line.strip()) == 0:
outf.write(line.replace('\n', ''))
else:
outf.write(line)
答案 2 :(得分:0)
假设您知道每行的字段数并且没有字段包含 csv 分隔符(逗号):您可以这样做:
number_of_columns_in_the_table = 5 #assuming a line has 5 columns
with open ('filepath') as inf, open('filepath', 'w') as outf:
for line in inf:
#check if the number of "splits equals the nummber of fields"
if len(line.split(',')) < number_of_columns_in_the_table
outf.write(line.replace('\n', ''))
else:
outf.write(line)
编辑
number_of_columns_in_the_table = 5 #assuming a line has 5 columns
with open ('filepath') as inf, open('filepath', 'w') as outf:
for line in inf:
#check if the number of "splits equals the nummber of fields"
if len(line.split(',')) < number_of_columns_in_the_table
buf += line.replace('\n', '');
if len(line.split(',')) == number_of_columns_in_the_table
outf.write( buf)
else:
outf.write(line)