Pandas:从不完整的布尔系列更新DataFrame

时间:2018-02-28 20:26:53

标签: python pandas dataframe

我有两个DataFrame:

>>> df1
              above below last_below
asn   country
12345 MX      6     3     1002000
      US      5     4     1006000
54321 MX      4     5     1004000
>>> df2
              above below
asn   country
12345 MX      1     0
54321 MX      0     1
      US      1     0

我像这样更新df1

>>> df1 = df1.add(df2, fill_value=0)
>>> df1
              above below last_below
asn   country
12345 MX      7.0   3.0   1002000.0
      US      5.0   4.0   1006000.0
54321 MX      4.0   6.0   1004000.0
      US      1.0   0.0         NaN

现在我想更新last_below列以将其设置为当前时间(我们在此示例中说1008000 如果below列是1

中的df2

我可以在below中获取1列为df2的所有索引的列表,如下所示:

>>> below = df2.below == 1
>>> below
asn   country
12345 MX        False
54321 MX        True
      US        False
Name: below, dtype: bool

但是,如果我尝试使用此系列更新df1,则会收到错误消息:

>>> df1.loc[below, "last_below"] = time.time()
Traceback (most recent call laist):
  File "<stdin>", line 1, in <module>
  File "/.../pandas/core/indexing.py", line 178, in __setitem__
    indexer = self.__get_setitem_indexer(key)
  File "/.../pandas/core/indexing.py", line 171, in __get_setitem_indexer
    raise IndexingError(key)
pandas.core.indexing.IndexingError: (asn   country
12345  MX       False
54321  MX        True
       US       False
Name: below, dtype: bool, 'last_below')

只是尝试 阅读 df1而不是更新会产生以下结果:

>>> df1[below]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../pandas/core/frame.py", line 1958, in __getitem__
    return self._getitem_array(key)
  File "/.../pandas/core/frame.py", line 1998, in _getitem_array
    key = check_bool_indexer(self.index, key)
  File "/.../pandas/core/indexing.py", line 1939, in check_bool_indexer
    raise IndexingError('Unalignable boolean Series provided as '
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

如何对齐这些索引?

1 个答案:

答案 0 :(得分:1)

IIUC使用.loc

df1.loc[below[below].index,'last_below']=1008000
df1
Out[607]: 
               above      below  last_below
asn   country                              
12345 MX         7.0        3.0   1002000.0
      US         5.0        4.0   1006000.0
54321 MX         4.0        6.0   1008000.0
      US         1.0        0.0         NaN