Question

我想从DataFrame中选择一个列的子集而不复制数据。从this answer开始，如果列具有不同的dtypes，那似乎是不可能的。任何人都可以确认吗？对我来说，似乎必须有一种方法，因为这个特征是如此重要。

例如，df.loc[:, ['a', 'b']]会生成副本。

Answer 1

此帖仅适用于所有列中具有相同dtypes的数据框。

如果要选择的列在.iloc内使用切片进行相互规则的步幅，则可能会出现这种情况。因此，总是可以选择任意两列，但是对于两列以上，我们需要在它们之间有规律的步幅。在所有这些情况下，我们需要知道他们的列ID和步幅。

让我们尝试在一些示例案例的帮助下理解这些。

案例＃1：从第0个col ID开始的两列

In [47]: df1
Out[47]: 
   a  b  c  d
0  5  0  3  3
1  7  3  5  2
2  4  7  6  8

In [48]: np.array_equal(df1.loc[:, ['a', 'b']], df1.iloc[:,0:2])
Out[48]: True

In [50]: np.shares_memory(df1, df1.iloc[:,0:2]) # confirm view
Out[50]: True

案例＃2：从第1个col ID开始的两列

In [51]: df2
Out[51]: 
   a0  a  a1  a2  b  c  d
0   8  1   6   7  7  8  1
1   5  8   4   3  0  3  5
2   0  2   3   8  1  3  3

In [52]: np.array_equal(df2.loc[:, ['a', 'b']], df2.iloc[:,1::3])
Out[52]: True

In [54]: np.shares_memory(df2, df2.iloc[:,1::3]) # confirm view
Out[54]: True

案例＃2：从第1个col ID开始的三个列和2个列的步幅

In [74]: df3
Out[74]: 
   a0  a  a1  b  b1  c  c1  d  d1
0   3  7   0  1   0  4   7  3   2
1   7  2   0  0   4  5   5  6   8
2   4  1   4  8   1  1   7  3   6

In [75]: np.array_equal(df3.loc[:, ['a', 'b', 'c']], df3.iloc[:,1:6:2])
Out[75]: True

In [76]: np.shares_memory(df3, df3.iloc[:,1:6:2]) # confirm view
Out[76]: True

选择4列：

In [77]: np.array_equal(df3.loc[:, ['a', 'b', 'c', 'd']], df3.iloc[:,1:8:2])
Out[77]: True

In [78]: np.shares_memory(df3, df3.iloc[:,1:8:2])
Out[78]: True

选择列的子集而不复制

1 个答案: