Question

我对以下代码行的语法感到困惑：

x_values = dataframe[['Brains']]

数据框对象由2列（大脑和身体）组成

Brains Bodies
42     34
32     23

当我打印x_values时，我会得到这样的结果：

Brains
0  42
1  32

就数据框对象的属性和方法而言，我知道pandas文档，但是双括号语法让我感到困惑。

Answer 1

考虑一下：

来源DF：

In [79]: df
Out[79]:
   Brains  Bodies
0      42      34
1      32      23

选择一列 - 导致Pandas.Series：

In [80]: df['Brains']
Out[80]:
0    42
1    32
Name: Brains, dtype: int64

In [81]: type(df['Brains'])
Out[81]: pandas.core.series.Series

选择DataFrame的子集 - 导致DataFrame：

In [82]: df[['Brains']]
Out[82]:
   Brains
0      42
1      32

In [83]: type(df[['Brains']])
Out[83]: pandas.core.frame.DataFrame

结论：第二种方法允许我们从DataFrame中选择多个列。第一个只是选择单列...

演示：

In [84]: df = pd.DataFrame(np.random.rand(5,6), columns=list('abcdef'))

In [85]: df
Out[85]:
          a         b         c         d         e         f
0  0.065196  0.257422  0.273534  0.831993  0.487693  0.660252
1  0.641677  0.462979  0.207757  0.597599  0.117029  0.429324
2  0.345314  0.053551  0.634602  0.143417  0.946373  0.770590
3  0.860276  0.223166  0.001615  0.212880  0.907163  0.437295
4  0.670969  0.218909  0.382810  0.275696  0.012626  0.347549

In [86]: df[['e','a','c']]
Out[86]:
          e         a         c
0  0.487693  0.065196  0.273534
1  0.117029  0.641677  0.207757
2  0.946373  0.345314  0.634602
3  0.907163  0.860276  0.001615
4  0.012626  0.670969  0.382810

如果我们在列表中只指定一列，我们将获得一个包含一列的DataFrame：

In [87]: df[['e']]
Out[87]:
          e
0  0.487693
1  0.117029
2  0.946373
3  0.907163
4  0.012626

Answer 2

Python中没有[[和]]的特殊语法。相反，正在创建一个列表，然后该列表作为参数传递给DataFrame索引函数。

根据@ MaxU的回答，如果将单个字符串传递给DataFrame，则表示返回一列的系列。如果传递字符串列表，则返回包含给定列的DataFrame。

因此，当您执行以下操作时

# Print "Brains" column as Series
print(df['Brains'])
# Return a DataFrame with only one column called "Brains"
print(df[['Brains']])

相当于以下

# Print "Brains" column as Series
column_to_get = 'Brains'
print(df[column_to_get])
# Return a DataFrame with only one column called "Brains"
subset_of_columns_to_get = ['Brains']
print(df[subset_of_columns_to_get])

在这两种情况下，DataFrame都使用[]运算符进行索引。

Python使用[]运算符进行索引和构建列表文字，最后我相信这是你的困惑。 [中的]和df[['Brains']]外部正在执行索引，而内部正在创建列表。

>>> some_list = ['Brains']
>>> some_list_of_lists = [['Brains']]
>>> ['Brains'] == [['Brains']][0]
True
>>> 'Brains' == [['Brains']][0][0] == [['Brains'][0]][0]
True

我在上面说明的是，Python在任何时候都看不到[[并特别解释它。在最后一个错综复杂的示例（[['Brains'][0]][0]）中，没有特殊的][运算符或]][运算符......会发生什么

创建单个元素列表（['Brains']）
该列表的第一个元素已编入索引（['Brains'][0] =＆gt; 'Brains'）
将其放入另一个列表（[['Brains'][0]] =＆gt; ['Brains']）
然后该列表的第一个元素被编入索引（[['Brains'][0]][0] =＆gt; 'Brains'）

Answer 3

其他解决方案展示了系列和数据框之间的区别。对于具有数学思想的人，您不妨考虑输入和输出的尺寸。这是一个摘要：

Object                                Series          DataFrame
Dimensions (obj.ndim)                      1                  2
Syntax arg dim                             0                  1
Syntax                             df['col']        df[['col']]
Max indexing dim                           1                  2
Label indexing              df['col'].loc[x]   df.loc[x, 'col']
Label indexing (scalar)      df['col'].at[x]    df.at[x, 'col']
Integer indexing           df['col'].iloc[x]  df.iloc[x, 'col']
Integer indexing (scalar)   df['col'].iat[x]   dfi.at[x, 'col']

当您为pd.DataFrame.__getitem__指定标量或列表实参时，其中[]是语法糖，实参的维数比结果的维数小。因此，标量（0维）给出一维序列。列表（一维）给出了二维数据帧。这是有道理的，因为附加维度是数据帧索引，即行。即使您的数据框恰好没有行也是如此。

Pandas中双括号`[[...]]`和单括号`[..]`索引之间的区别

3 个答案: