Question

我最近开始使用pandas。任何人都可以用.corrwith()和Series向我解释函数DataFrame的行为差异吗？

假设我有一个DataFrame：

frame = pd.DataFrame(data={'a':[1,2,3], 'b':[-1,-2,-3], 'c':[10, -10, 10]})

我想要计算特征'a'和所有其他特征之间的相关性。我可以通过以下方式完成：

frame.drop(labels='a', axis=1).corrwith(frame['a'])

结果将是：

b   -1.0
c    0.0

但非常相似的代码：

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

生成绝对不同且不可接受的表：

a   NaN
b   NaN
c   NaN

所以，我的问题是：为什么在DataFrame作为第二个参数的情况下，我们会得到如此奇怪的输出？

Answer 1

我认为您正在寻找：

让我们说你的框架是：

frame = pd.DataFrame(np.random.rand(10, 6), columns=['cost', 'amount', 'day', 'month', 'is_sale', 'hour'])

您希望'cost'和'amount'列与每个组合中的所有其他列相关联。

focus_cols = ['cost', 'amount']
frame.corr().filter(focus_cols).drop(focus_cols)

回答你的问题：

成对计算   两个DataFrame对象的行或列之间的相关性。

参数：

其他：DataFrame

轴：{0或'索引'，1或'列'}，

默认0 0或'index'计算逐列，1或'列'用于逐行删除：boolean，default False删除丢失的索引   结果，默认返回所有返回的union：correls：Series

corrwith与add，sub，mul，div的行为类似，因为它希望找到DataFrame或{尽管文档只说Series。

，{1}}已在other中传递

当DataFrame为other时，它会广播该系列并按照Series指定的轴进行匹配，默认值为0.这就是以下工作的原因：

axis

当frame.drop(labels='a', axis=1).corrwith(frame.a) b -1.0 c 0.0 dtype: float64为other时，它将与DataFrame指定的轴匹配，并关联由另一个轴标识的每个对。如果我们这样做：

axis

只有frame.drop('a', axis=1).corrwith(frame.drop('b', axis=1)) a NaN b NaN c 1.0 dtype: float64是共同的，只有c计算了相关性。

在您指定的情况下：

frame.drop(labels='a', axis=1).corrwith(frame[['a']])因frame[['a']]而属DataFrame，现在按[['a']]规则播放，其中的列必须与其相关的列匹配。但是，您明确地从第一帧中删除DataFrame，然后与a相关联，只有DataFrame。每列的结果为a。

Answer 2

corrwith定义为DataFrame.corrwith(other, axis=0, drop=False)，因此默认为axis=0 - 即Compute pairwise correlation between columns of two **DataFrame** objects

因此两个DF中的列名/标签必须相同：

In [134]: frame.drop(labels='a', axis=1).corrwith(frame[['a']].rename(columns={'a':'b'}))
Out[134]:
b   -1.0
c    NaN
dtype: float64

NaN - 表示（在这种情况下）没有可比较/关联的内容，因为c DF

中没有名为other的列

如果您将系列作为other传递，则会将其翻译（来自link，您已在comment中发布）：

In [142]: frame.drop(labels='a', axis=1).apply(frame.a.corr)
Out[142]:
b   -1.0
c    0.0
dtype: float64

Answer 3

enter image description here

简单的输出

Answer 4

对不起，.. 熊猫系列数据框只能用相同的columnz来分析，所以没有系列的一致性

喜欢

x = np.array（[2，4，6，8.2]）。reshape（-1，1）

y = np.array（[2.3，3.11，.5，7，10，11，12]）。reshape（-1，1）

a = pd.DataFrame（x，列= ['aa']） b = pd.DataFrame（y，columns = ['aa']）

a.corrwith（b）

pandas.DataFrame corrwith（）方法

4 个答案:

我认为您正在寻找：

回答你的问题：