Question

我的数据集如下：

    Id  Economics      English    History  Literature  
0  56          1            1          2        1                     
1  11          1            0          0        1                    
2   6          0            1          1        0                     
3  43          2            0          1        1                     
4  14          0            1          1        0

我通过从文件中读取一些csv来创建此数据集，例如，我可以使用df [＆＃39; Economics＆＃39;]轻松访问这些列。然后我将其保存到文件中：

df.to_csv(file_path, sep='\t')

但是当我在其他功能中重新打开其他功能的数据集时，我试图以相同的方式访问这些列，即

df=pd.read_csv(file_path, sep='\t')
print df['Economics']

我有

KeyError：经济学

我在阅读时尝试了多种编码，并且还验证了它是不是多索引数据帧，但编码和索引一切正常。我发现还有另一种方法：df.get（＆＃39; Economocs＆＃39;），在这种情况下工作没有错误。但是，如果我想迭代列名称，再次寻找经济学，我有一个KeyError。

所以我的问题：为什么会这样？为什么有时我可以直接使用df [＆＃39; column_name＆＃39;]访问列，有时我需要使用df.get（＆＃39; column_name＆＃39;）。如果第一种方法不起作用，如何处理column.names？

Answer 1

看起来列名中有一些不需要的字符。也许就像“经济学”一样。或其他什么。

df.get('Economics')在这种情况下不会给出KeyError，而只会返回任何内容。

尝试使用df.columns检查len(df.columns[1])的输出和列名称的长度。

Answer 2

我猜你要么在所有/某些列名中都有尾随空格，要么只有一个列，如下面的测试示例所示：

测试数据：

Id  Economics     English   History   Literature  
56  1   1   2   1
11  1   0   0   1
6   1   1   0   0
43  2   0   1   1
14  1   1   1   0

测试代码：

import pandas as pd

df = pd.read_csv('test.csv', sep='\t')
print(df)
print(df.columns.tolist())

输出：

  Id  Economics     English   History   Literature
0                                  56  1   1   2   1
1                                  11  1   0   0   1
2                                  6   1   1   0   0
3                                  43  2   0   1   1
4                                  14  1   1   1   0
['Id  Economics     English   History   Literature  ']

DataFrame只有一列：'Id Economics English History Literature '

允许在sep='\t'中将sep='\s+'更改为pd.read_csv()，并针对相同的数据集执行我们的测试代码：

   Id  Economics  English  History  Literature
0  56          1        1        2           1
1  11          1        0        0           1
2   6          1        1        0           0
3  43          2        0        1           1
4  14          1        1        1           0
['Id', 'Economics', 'English', 'History', 'Literature']

以不同的方式访问pandas中的列

2 个答案: