Question

所以我有这个数据集，其中有时会在某些单元格中输入随机换行符，我需要删除它们。

这是我试过的：

with open ('filepath') as inf, open('filepath', 'w') as outf:
    for line in inf:
        outf.write(line.replace('\n', ''))

不幸的是，这删除了所有换行符，包括行尾的字符，这将我的 csv 文件变成了一个大的单行

有谁知道我如何只能删除随机换行符而不是“真正的”结束符？

编辑：如果有帮助，每个“真正的”新行都以 6 位数字字符串开头（标题行除外）。也许一些正则表达式模式可以提前检测是否有一些数字字符串可以工作？

Edit2：我试过用大熊猫来编辑它：

df = pd.read_csv(filepath)

for i in df.columns:
    if df[i].dtype==np.object:
        df[i] = df[i].str.replace('\n','')

奇怪的是，如果我将 .csv 中的内容复制到一个新的文本文件中，这会起作用，但它不适用于我的原始 csv 文件，而且我不知道为什么。

最终编辑：

非常感谢 DDS 的帮助。设法让它使用这个工作：

num_cols = 48

buf = ""

with open (filepath) as inf, open (filepath, 'w') as outf:
    for line in inf:
        if len(line.split(',')) < num_cols:
            buf += line.replace('\n', '')
            if len(buf.split(',')) == num_cols:
                outf.write(buf+'\n')
            else: continue
            buf = ""
        else:
            outf.write(line)

Answer 1

您可以通过多种方式实现这一目标。

由于您只关心最后一次出现的换行符，您可以在替换字符串的末尾添加一个换行符

    with open ('filepath') as inf, open('filepath', 'w') as outf:
    for line in inf:
        outf.write(line.replace('\n', '') + '\n')

您可以计算出现的换行符数并利用 count argument of the replace method 传递 n - 1 作为要替换的换行符数

    with open ('filepath') as inf, open('filepath', 'w') as outf:
    for line in inf:
        outf.write(line.replace('\n', '', line.count('\n') - 1))

利用 Python 的 re 库进行替换，如果有后续的换行符，则提前检查以替换换行符。

    import re
    result = re.sub( '\n*(?=.*\n)','' ,'ansd\nasdn\naskd\n')
    print(result)
    'ansdasdnaskd\n'

Answer 2

首先控制你的行是否为空，然后写行

 for line in inf:
    if len(line.strip()) == 0:
          outf.write(line.replace('\n', ''))
    else:
        outf.write(line)

Answer 3

假设您知道每行的字段数并且没有字段包含 csv 分隔符（逗号）：您可以这样做：

    number_of_columns_in_the_table = 5 #assuming a line has 5 columns
    with open ('filepath') as inf, open('filepath', 'w') as outf:
        for line in inf:
            #check if the number of "splits equals the nummber of fields"
            if len(line.split(',')) < number_of_columns_in_the_table
               
 outf.write(line.replace('\n', ''))
            else:
                outf.write(line)

编辑

number_of_columns_in_the_table = 5 #assuming a line has 5 columns
    with open ('filepath') as inf, open('filepath', 'w') as outf:
        for line in inf:
            #check if the number of "splits equals the nummber of fields"
            if len(line.split(',')) < number_of_columns_in_the_table
               buf += line.replace('\n', '');
           if len(line.split(',')) == number_of_columns_in_the_table
               outf.write( buf)
            else:
                outf.write(line)

如何删除csv列中的换行符，而不删除endrow换行符？

3 个答案: