是否可以指示Pandas忽略超过标题大小的列?
import pandas
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123\n")
df = pandas.read_csv('test.csv')
print(df)
给出答案:
datetime A
0 2018-10-09 18:00:07 123
不过,加载的CSV文件包含更多在标题中定义的数据列:
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")
df = pandas.read_csv('test.csv')
print(df)
返回:
datetime A
2018-10-09 18:00:07 123 ABC XYZ
Pandas将标题移到数据的最右边。
我需要不同的行为。我希望熊猫忽略标题以外的数据行。
注意:我无法枚举列,因为这是一个通用的用例。由于某些与我的代码无关的原因,有时会有更多预期的数据。我想忽略多余的数据。
答案 0 :(得分:2)
似乎Pandas意识到与实际的标头相比,列太多了,并假设前两个(数据)列是(多)索引。
使用usecols
中的read_csv
参数指定要读取的数据列:
import pandas
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")
df = pandas.read_csv('test.csv', usecols=[0,1])
print(df)
收益
datetime A
0 2018-10-09 18:00:07 123
答案 1 :(得分:0)
现在代码显示了问题的答案。
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")
with open("test.csv") as csv_file:
for i, line in enumerate(csv_file):
if i == 0:
headerCount = line.count(",") + 1
colCount = headerCount
elif i == 1:
dataCount = line.count(",") + 1
elif i > 1:
break
if (headerCount < dataCount):
print("Warning: Header and data size mismatch. Columns beyond header size will be removed.")
colCount=headerCount
df = pandas.read_csv('test.csv', usecols=range(colCount))
print(df)
产生:
Warning: Header and data size mismatch. Columns beyond header size will be removed.
datetime A
0 2018-10-09 18:00:07 123
答案 2 :(得分:-1)
要使问题更完整,请使用以下技巧:
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A, B, C\n")
csv_file.write("2018-10-09 18:00:07, 123\n")
with open("test.csv") as csv_file:
for i, line in enumerate(csv_file):
if i == 0:
headerCount = line.count(",") + 2
elif i == 1:
dataCount = line.count(",") + 2
if (headerCount != dataCount):
print("Warning: Header and data size mismatch. Columns beyond header size will be removed.")
elif i > 1:
break
df = pandas.read_csv('test.csv', usecols=range(dataCount-1))
print(df)
给出正确的熊猫对象。
Warning: Header and data size mismatch. Columns beyond header size will be removed.
datetime A
0 2018-10-09 18:00:07 123