Question

我有一个文本文件，正在使用python转换为 csv 。文本文件具有使用多个空格设置的列。我的代码剥离了这一行，连续将2个空格转换为逗号，然后再次拆分了行。当我这样做时，列不会对齐，因为有些列的空白比其他列更多。如何在代码中添加一些内容，以删除 csv 文件中的空白单元格？

我曾尝试将csv文件转换为 pandas数据库，但是当我运行时

import pandas as pd
df = pd.read_csv('old.Csv')

delim_whitespace=True

df.to_csv("New.Csv", index=False)

它返回错误ParserError: Error tokenizing data. C error: Expected 40 fields in line 10, saw 42

要删除行并拆分行的代码是

import csv

txtfile = r"Old.txt"
csvfile = r"Old.Csv"

with open(txtfile, 'r') as infile, open(csvfile, 'w', newline='') as outfile:    
    stripped = (line.strip() for line in infile)
    replace = (line.replace("  ", ",") for line in stripped if line)
    lines = (line.split(",") for line in replace if infile)
    writer = csv.writer(outfile)
    writer.writerows(lines)

Answer 1

一种解决方案是预先声明列名称，以便将熊猫强制为具有不同列数的数据。这样的事情应该可以工作：

df = pd.read_csv('myfilepath', names = ['col1', 'col2', 'col3'])

您将必须自行调整分隔符和列名称/列数。

Answer 2

（经编辑）以下代码应适用于您的文本文件：

   a               b  c  d  e
=============================
1  qwerty          3  4  5  6
2  ewer            e  r  y  i               
3  asdfghjkutrehg  c  v  b  n

您可以尝试：

import pandas as pd
df = pd.read_fwf('textfile.txt', delimiter='  ', header=0, skiprows=[1])
df.to_csv("New.csv", index=False)
print(df)  

   Unnamed: 0               a  b  c  d  e
0           1          qwerty  3  4  5  6
1           2            ewer  e  r  y  i
2           3  asdfghjkutrehg  c  v  b  n

Answer 3

您可以在excel中打开csv文件。

选择空单元格。

（Shift + G）

选择空白

输入

删除空白单元格并向左移动单元格。

如果这样不能正常工作。

首先在excel中替换空白并执行相同的过程

使用python从CSV文件中删除空白单元格

3 个答案: