Question

我正在使用python（使用pandas库）处理大型表。

我想对表的每一行执行各种矢量操作，例如Correlation。

这可能是一个简单的问题，但对我来说很难处理DataFrame结构。关于如何将每行（或列）转换为列表（或numpy数组），我不知道。

即使计算行数似乎也不是一个简单的问题，因为像df.count()这样的函数似乎忽略了空数据。

简单数据表和预期结果表如下所示。在这种情况下，我想计算每对行的总和。

实际表的大小要大得多（超过1000行和列），并且包含一些空值。

Data.csv：

Label Col1 Col2
Row1 1 2
Row2 3 4
Row3 5 6

Output.csv：

Label Col3
Row1,Row2 4,6
Row1,Row3 6,8
Row2,Row3 8,10

Answer 1

解决方案的一部分，因为您将拥有名称略有不同的重复行，因此您无法应用drop_duplicates数据帧方法：

import pandas as pd
from io import StringIO

data = """
Label Col1 Col2
Row1 1 2
Row2 3 4
Row3 5 6
"""

df1 = pd.DataFrame()

for row in range(df.shape[0]):
   df1 = pd.concat([df1, df.ix[row,:] + df[df['Label'] != df.Label[row]]])

df1.reset_index(drop=True, inplace=True)

In [103]: df1
Out[103]:
      Label Col1 Col2
0  Row1Row2    4    6
1  Row1Row3    6    8
2  Row2Row1    4    6
3  Row2Row3    8   10
4  Row3Row1    6    8
5  Row3Row2    8   10

Answer 2

使用色谱柱时，Pandas更快更自然。因此，我建议首先转置DF，然后只对列进行转换

链接：Invert index and columns in a pandas DataFrame

Python - 使用Pandas进行行计算

2 个答案: