Question

这是一种独特的加入/组合，但我不知道这是什么，所以请随意用术语来纠正我。

例如，我有一个系列profile，如下所示：

In [1]: profile = pd.Series(data=[0.8,0.64,0.51,0.5,0.5], index=['google.com','facebook.com','twitter.com', 'instagram.com', 'github.com'])

In [2]: profile
Out[2]: 
google.com       0.80
facebook.com     0.64
twitter.com      0.51
instagram.com    0.50
github.com       0.50
dtype: float6

我有一个transaction系列如下：

In [3]: transaction = pd.Series(data=[1,1,1,1], index=['twitter.com','facebook.com','instagram.com','9gag.com'])

In [4]: transaction
Out[4]: 
twitter.com      1
facebook.com     1
instagram.com    1
9gag.com         1
dtype: int64

我想要实现的是系列window，我在其中比较profile和transaction：如果transaction中的索引也存在于profile中我们得到了某个索引，它是各自的值。仅在profile中唯一的其余索引的填充值应为0。

In [5]: window
Out[5]: 
google.com       0
facebook.com     1
twitter.com      1
instagram.com    1
github.com       0
dtype: int64

是否有任何现有的内置方法/功能可以做到这一点？

我已经尝试过：

window = transaction[transaction.keys().isin(profile.keys())]

但它只返回transaction和profile的交集。我在combine()中遇到了此Series函数，但我不知道在func参数中应用了什么（isin()无效）。

Answer 1

从P.77.0版Pandas开始，您可以重新索引该系列。

>>> transaction.reindex(profile.index).fillna(0)
google.com       0
facebook.com     1
twitter.com      1
instagram.com    1
github.com       0
dtype: float64

它似乎比使用loc略快，但我还没有在更大的数据框架上对此进行测试。

%timeit transaction.reindex(profile.index).fillna(0)
1000 loops, best of 3: 224 µs per loop

%timeit transaction.loc[profile.index].fillna(0)
1000 loops, best of 3: 329 µs per loop

大熊猫从系列B获得同样属于A系列的参赛作品;但是仅在A系列中唯一的条目的填充值应为0

1 个答案: