使用切片布尔索引的pandas子集

时间:2016-10-17 21:35:46

标签: python pandas indexing dataframe slice

制作测试数据的代码:

@Html.DropDownList("folders", new SelectList(ViewBag.Folders, "Text", "Text"), "--- Select Album ---", new { @class = "form-control" })

给出

import pandas as pd
import numpy as np

testdf = {'date': range(10),
      'event': ['A', 'A', np.nan, 'B', 'B', 'A', 'B', np.nan, 'A', 'B'],
      'id': [1] * 7 + [2] * 3}
testdf = pd.DataFrame(testdf)

print(testdf)

子集testdf

    date event  id
0     0     A   1
1     1     A   1
2     2   NaN   1
3     3     B   1
4     4     B   1
5     5     A   1
6     6     B   1
7     7   NaN   2
8     8     A   2
9     9     B   2

(注意:未重新编入索引)

创建条件布尔索引

df_sub = testdf.loc[testdf.event == 'A',:]
print(df_sub)
    date event  id
0     0     A   1
1     1     A   1
5     5     A   1
8     8     A   2

我想在原始df中使用这个新索引插入条件值,比如

bool_sliced_idx1 = df_sub.date < 4
bool_sliced_idx2 = (df_sub.date > 4) & (df_sub.date < 6)

显然(现在)给出错误:

dftest[ 'new_column'] = np.nan
dftest.loc[bool_sliced_idx1, 'new_column'] = 'new_conditional_value'

pandas.core.indexing.IndexingError: Unalignable boolean Series key provided 看起来像

bool_sliced_idx1

我尝试了>>> print(bool_sliced_idx1) 0 True 1 True 5 False 8 False Name: date, dtype: bool ,但这不起作用,因为

testdf.ix[(bool_sliced_idx1==True).index,:]

2 个答案:

答案 0 :(得分:3)

IIUC,您可以一次性结合所有条件,而不是试图将它们链接起来。例如,df_sub.date < 4实际上只是(testdf.event == 'A') & (testdf.date < 4)。所以,你可以这样做:

# Create the conditions.
cond1 = (testdf.event == 'A') & (testdf.date < 4)
cond2 = (testdf.event == 'A') & (testdf.date.between(4, 6, inclusive=False))

# Make the assignments.
testdf.loc[cond1, 'new_col'] = 'foo'
testdf.loc[cond2, 'new_col'] = 'bar'

哪会给你:

   date event  id new_col
0     0     A   1     foo
1     1     A   1     foo
2     2   NaN   1     NaN
3     3     B   1     NaN
4     4     B   1     NaN
5     5     A   1     bar
6     6     B   1     NaN
7     7   NaN   2     NaN
8     8     A   2     NaN
9     9     B   2     NaN

答案 1 :(得分:0)

这有效

idx = np.where(bool_sliced_idx1==True)[0]
## or 
# np.ravel(np.where(bool_sliced_idx1==True))

idx_original = df_sub.index[idx]
testdf.iloc[idx_original,:]