修改

Question

我有两个数据帧

import pandas as pd

df = pd.DataFrame({'Foo': ['A','B','C','D','E'],
'Score': [4,6,2,7,8]
})

df2 = pd.DataFrame({'Bar': ['Z','Y','X','W','V'],
'Score': [5,10,10,5,9]
})

print (df)
print (df2)

和一个功能：

def DiffMatrix(df, df2):
    n=pd.DataFrame()
    for i in range(len(df2)):
        x = df2.ix[df.index[i], 'Score']
        y= x - df['Score']
        n = n.append(y, ignore_index=True)
    return n

diff= DiffMatrix(df, df2)
print (diff)

[5 rows x 2 columns]
   0  1  2  3  4
0  1 -1  3 -2 -3
1  6  4  8  3  2
2  6  4  8  3  2
3  1 -1  3 -2 -3
4  5  3  7  2  1

[5 rows x 5 columns]

但是，如果我更改索引或更改列名称，如：

df=df.set_index('Foo')
df2=df2.set_index('Bar')

或

df2 = pd.DataFrame({'Bar': ['Z','Y','X','W','V'],
'ScoreX': [5,10,10,5,9]
})

该功能不起作用，因为引用依赖于列名＆＃39;得分＆＃39;。有没有办法将代码更改为引用df['Score']模糊地作为第一列，并且还适应索引中的更改，因此如果我更改索引，输出将变为：

    A   B   C   D   E
Z   1   -6  3   -2  -3
Y   6   4   8   3   2
X   6   4   8   3   2
W   1   -1  3   -2  -3
V   5   3   7   2   1

Answer 1

您可以通过索引引用Panda的列，如果您知道自己总是想要引用第二列（基于第0列的索引），那么您可以执行以下操作。

而不是：

y= x - df['Score']

这样做：

y= x - df[df.columns[1]]

修改

根据OP关于选择特定行的请求，您可以使用pandas.DataFrame.iloc[...]

例如你可以这样做：

diff.iloc[[0]]

在您的diff数据框上生成输出：

   0  1  2  3  4
0  1 -1  3 -2 -3

如果您想要选择多行，您可以使用切片或您想要的行索引列表

#slicing
diff.iloc[1:4]

给你

   0  1  2  3  4
1  6  4  8  3  2
2  6  4  8  3  2
3  1 -1  3 -2 -3

和

#list of row indices
diff.iloc[[0,2,4]]

产量

   0  1  2  3  4
0  1 -1  3 -2 -3
2  6  4  8  3  2
4  5  3  7  2  1

Answer 2

您可能希望使用.iloc方法来访问您的数据：

df = pd.DataFrame({'A':[1,2], 'B':[3,4]}, index=['x', 'y'])
df

   A  B
x  1  3
y  2  4

所以访问第二行：

df.iloc[1,:]

A    2
B    4
Name: y, dtype: int64

并访问第二列

df.iloc[:,1]

x    3
y    4
Name: B, dtype: int64

确实你可以混合它们并得到一个标量：

df.iloc[1,1]

4

Python Pandas引用函数中不明确的列和行

2 个答案:

修改