Question

我将Whole_mat视为pandas df。 corpus_index作为我要复制到New_mat的有效行，我只想要列号1,4和7.但是顺序应该是7,1,4。下面是我尝试但我得到的TypeError：unhashable类型：'清单”。整个垫子形状是，例如，Nx10，我想要nx3用于New_mat。

New_mat = []
for i in range(len(corpus_index):
    index = corpus_index[i]
    New_mat.append(Whole_mat[[index], [7,1,4]])
print New_mat

什么是解决问题的更好方法？

Answer 1

我认为你不需要使用for循环进行迭代，你可以尝试这样做，

New_mat = Whole_mat.loc[corpus_index.index, Whole_mat.columns[[7, 1, 4]]]

注意：列索引从0开始。

Answer 2

您只需要简单的索引。例如：

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame([np.random.rand(10) for _ in xrange(10)])

In [4]: df.ix[[1,4,5],[3,4,5]]
Out[4]:
          3         4         5
1  0.523302  0.104327  0.672953
4  0.303693  0.785685  0.080759
5  0.955738  0.987779  0.410638

此处有更多信息：http://pandas.pydata.org/pandas-docs/stable/indexing.html

无论何时使用大熊猫，都要避免＆＃34;循环播放＆＃34;尽可能多（非常经常）。使用pandas的整个目的是矢量化。

如何遍历pandas数据框以提取特定行和选定列

2 个答案: