Question

我对如何定义python轴以及它们是否引用DataFrame的行或列感到非常困惑。请考虑以下代码：

>>> df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]], columns=["col1", "col2", "col3", "col4"])
>>> df
   col1  col2  col3  col4
0     1     1     1     1
1     2     2     2     2
2     3     3     3     3

因此，如果我们致电df.mean(axis=1)，我们会获得跨行的平均值：

>>> df.mean(axis=1)
0    1
1    2
2    3

但是，如果我们拨打df.drop(name, axis=1)，我们实际上删除一列，而不是一行：

>>> df.drop("col4", axis=1)
   col1  col2  col3
0     1     1     1
1     2     2     2
2     3     3     3

有人可以帮助我理解＆＃34;轴的含义＆＃34;在pandas / numpy / scipy？

旁注，DataFrame.mean可能被定义为错误。它在DataFrame.mean的文档中说axis=1应该是对列的意思，而不是行......

Answer 1

将 0 = down 和 1 =跨记住它可能是最简单的。

这意味着：

使用axis=0将方法应用于每列，或应用于行标签（索引）。
使用axis=1在每行或列标签上应用方法。

这是一张图片，显示每个轴引用的DataFrame部分：

记住Pandas遵循NumPy使用axis这个词也很有用。用法在NumPy的glossary of terms中解释：

为具有多个维度的数组定义轴。二维数组有两个相应的轴：第一个垂直向下跨行（轴0），第二个水平跨列（轴1）。 [我的重点]

因此，关于问题中的方法df.mean(axis=1)，似乎是正确定义的。它采用条目水平跨列的平均值，即沿着每个单独的行。另一方面，df.mean(axis=0)将是一个垂直向下跨行的操作。

同样，df.drop(name, axis=1)指的是列标签上的操作，因为它们直观地穿过水平轴。指定axis=0会使方法代替行。

Answer 2

另一种解释方式：

// Not realistic but ideal for understanding the axis parameter 
df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]],
                  columns=["idx1", "idx2", "idx3", "idx4"],
                  index=["idx1", "idx2", "idx3"]
                 )

---------------------------------------1
|          idx1  idx2  idx3  idx4
|    idx1     1     1     1     1
|    idx2     2     2     2     2
|    idx3     3     3     3     3
0

关于df.drop（轴表示位置）

A: I wanna remove idx3.
B: **Which one**? // typing while waiting response: df.drop("idx3",
A: The one which is on axis 1
B: OK then it is >> df.drop("idx3", axis=1)

// Result
---------------------------------------1
|          idx1  idx2     idx4
|    idx1     1     1     1
|    idx2     2     2     2
|    idx3     3     3     3
0

关于df.apply（轴表示方向）

A: I wanna apply sum.
B: Which direction? // typing while waiting response: df.apply(lambda x: x.sum(),
A: The one which is on *parallel to axis 0*
B: OK then it is >> df.apply(lambda x: x.sum(), axis=0)

// Result
idx1    6
idx2    6
idx3    6
idx4    6

Answer 3

已有正确的答案，但我给你另一个例子＆gt; 2个维度。

参数axis表示轴需要更改。
例如，假设存在尺寸 a x b x c 的数据框。

df.mean(axis=1)返回尺寸 a x 1 x c 的数据框。
df.drop("col4", axis=1)返回尺寸为 a x（b-1）x c 的数据框。

Answer 4

应该更广为人知的是，字符串别名＆quot;＆＃39; 和＆＃39; 可用于代替整数0/1。别名更明确，帮助我记住计算是如何进行的。 “索引”的另一个别名＆＃39;是＆＃39;行＆＃39; 。

当使用axis='index'时，计算会在列下发生，这会令人困惑。但是，我记得它的结果与另一行的大小相同。

让我们在屏幕上获取一些数据，看看我在说什么：

df = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd'))
          a         b         c         d
0  0.990730  0.567822  0.318174  0.122410
1  0.144962  0.718574  0.580569  0.582278
2  0.477151  0.907692  0.186276  0.342724
3  0.561043  0.122771  0.206819  0.904330
4  0.427413  0.186807  0.870504  0.878632
5  0.795392  0.658958  0.666026  0.262191
6  0.831404  0.011082  0.299811  0.906880
7  0.749729  0.564900  0.181627  0.211961
8  0.528308  0.394107  0.734904  0.961356
9  0.120508  0.656848  0.055749  0.290897

当我们想要取所有列的平均值时，我们使用axis='index'来获取以下内容：

df.mean(axis='index')
a    0.562664
b    0.478956
c    0.410046
d    0.546366
dtype: float64

同样的结果将得到：

df.mean() # default is axis=0
df.mean(axis=0)
df.mean(axis='rows')

要在行上从左到右使用操作，请使用axis =＆＃39; columns＆＃39;。我记得它可能会在我的DataFrame中添加一个额外的列：

df.mean(axis='columns')
0    0.499784
1    0.506596
2    0.478461
3    0.448741
4    0.590839
5    0.595642
6    0.512294
7    0.427054
8    0.654669
9    0.281000
dtype: float64

同样的结果将得到：

df.mean(axis=1)

添加一个轴= 0 /索引/行

的新行

让我们使用这些结果添加其他行或列来完成说明。因此，每当使用axis = 0 / index / rows时，就像获取DataFrame的新行一样。让我们添加一行：

df.append(df.mean(axis='rows'), ignore_index=True)

           a         b         c         d
0   0.990730  0.567822  0.318174  0.122410
1   0.144962  0.718574  0.580569  0.582278
2   0.477151  0.907692  0.186276  0.342724
3   0.561043  0.122771  0.206819  0.904330
4   0.427413  0.186807  0.870504  0.878632
5   0.795392  0.658958  0.666026  0.262191
6   0.831404  0.011082  0.299811  0.906880
7   0.749729  0.564900  0.181627  0.211961
8   0.528308  0.394107  0.734904  0.961356
9   0.120508  0.656848  0.055749  0.290897
10  0.562664  0.478956  0.410046  0.546366

添加一个轴= 1 /列

的新列

同样，当axis = 1 / columns时，它将创建可以轻松制作成自己的列的数据：

df.assign(e=df.mean(axis='columns'))

          a         b         c         d         e
0  0.990730  0.567822  0.318174  0.122410  0.499784
1  0.144962  0.718574  0.580569  0.582278  0.506596
2  0.477151  0.907692  0.186276  0.342724  0.478461
3  0.561043  0.122771  0.206819  0.904330  0.448741
4  0.427413  0.186807  0.870504  0.878632  0.590839
5  0.795392  0.658958  0.666026  0.262191  0.595642
6  0.831404  0.011082  0.299811  0.906880  0.512294
7  0.749729  0.564900  0.181627  0.211961  0.427054
8  0.528308  0.394107  0.734904  0.961356  0.654669
9  0.120508  0.656848  0.055749  0.290897  0.281000

看来您可以看到包含以下私有变量的所有别名：

df._AXIS_ALIASES
{'rows': 0}

df._AXIS_NUMBERS
{'columns': 1, 'index': 0}

df._AXIS_NAMES
{0: 'index', 1: 'columns'}

Answer 5

当axis ='rows'或axis = 0时，表示沿行方向（上下）访问元素。如果沿轴= 0应用总和，将得出每一列的总和。

当axis ='columns'或axis = 1时，表示在列方向上从左到右访问元素。如果沿axis = 1应用总和，我们将获得每一行的总数。

仍然令人困惑！但是以上对我来说比较容易。

Answer 6

我发现所有其他答案令人困惑。这是我的想法：

axis=0：结果的形状是水平的（连续）
axis=1：结果的形状是垂直的（一列）

所以

df.drop(name, axis=1)：删除一列
df.mean(axis=1)：计算一列（结果可以添加为新列）

Answer 7

我通过维度的变化记住了，如果axis=0，则行变化，列不变，如果axis=1，则列变化，行不变。

Pandas Dataframe / Numpy Array＆＃34; axis＆＃34;中的歧义定义

7 个答案:

添加一个轴= 0 /索引/行

添加一个轴= 1 /列