Question

我正在尝试使用熊猫读取.csv文件，其标题如下所示：

System Information_1
System Information_2
System Information_3
System Information_4

"Label1"; "Label2"; "Label3"; "Label4"; "Label5"; "Label6"
"alternative Label1"; "alternative Label2"; "alternative Label3"; "alternative Label4"; "alternative Label5"; "alternative Label6"
"unit1"; "unit2"; "unit3"; "unit4"; "unit5"; "unit6"

我正在使用以下代码进行阅读：
df = pd.read_csv('data.csv', sep=';', header=5, skiprows=[6,7], encoding='latin1')

但是，我的数据框最终以"unit1", "unit2", "unit3", "unit4", "unit5", "unit6"而不是"Label1", "Label2", "Label3", "Label4", "Label5", "Label6"作为列标签。

但是，在我的csv文件的旧版本中，导入代码可以正常工作。我可以在文件之间发现的区别是，较旧的文件在前4行中具有全套分隔符：

System Information_1;;;;;
System Information_2;;;;; 
etc.

有人知道该错误来自哪里以及如何解决吗？

Answer 1

您也可以跳过第一行，但是也不要将标头设置为5，因为它的值为0，因此可以将其保留为自动检测：

df = pd.read_csv('data.csv', sep=';', skiprows=[0,1,2,3,4,6,7], encoding='latin1')

Answer 2

您可以将列表用作标题参数：

import pandas as pd
from io import StringIO

data = """System Information_1
System Information_2
System Information_3
System Information_4

"Label1"; "Label2"; "Label3"; "Label4"; "Label5"; "Label6"
"alternative Label1"; "alternative Label2"; "alternative Label3" "alternative Label4"; "alternative Label5"; "alternative Label6"
"unit1"; "unit2"; "unit3"; "unit4"; "unit5"; "unit6" 
1;2;3;4;5;6
10;20;30;40;50;60
"""

df = pd.read_csv(StringIO(data), sep=';', header=[4], skiprows=[6, 7], encoding='latin1')

给予：

Answer 3

“ header”参数在“ skiprows”参数之后开始计数。

如果要将标签用作标题：

df = pd.read_csv('pruebasof.csv', sep=';', skiprows=[0,1,2,3,4,6], encoding='latin1')

其他，如果要使用替代标签作为标题：

df = pd.read_csv('pruebasof.csv', sep=';', skiprows=6, encoding='latin1')

我做到了，因此您可以在保留“单位”作为标签数据的同时使用标签。

pandas.read_csv在标题下方放置行时会导致列标签移位

3 个答案: