基于索引条件创建新列

时间:2018-01-19 11:18:01

标签: python pandas

我有df:

arrays = [np.array(['bar', 'bar', 'bar', 'bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz', 'baz', 'baz', 'foo', 'foo', 'foo', 'foo', 'foo', 'foo']),
       np.array(['one','two'] * 9),
        np.array([1,2,3] * 6)]
df = pd.DataFrame(np.random.randn(18,2), index=arrays)


                      col0      col1
bar one     1   0.872359    -1.115871
    two     2   -0.937908   -0.528563
    one     3   -0.118874   0.286595
    two     1   -0.507698   1.364643
    one     2   1.507611    1.379498
    two     3   -1.398019   -1.603056
baz one     1   1.498263    0.412380
    two     2   -0.930022   -1.483657
    one     3   -0.438157   1.465089
    two     1   0.161887    1.346587
    one     2   0.167086    1.246322
    two     3   0.276344    -1.206415
foo one     1   -0.045389   -0.759927
    two     2   0.087999    -0.435753
    one     3   -0.232054   -2.221466
    two     1   -1.299483   1.697065
    one     2   0.612211    -1.076738
    two     3   -1.482573   0.907826

现在我要创建“新”列:

for 'bar'
  if index.level(2) > 1
     "NEW" = col1
  else
     "NEW" = col2

对于'baz'与> 2相同

对于'foo'与> 3相同

没有Py循环怎么做?

2 个答案:

答案 0 :(得分:2)

您可以按级别使用get_level_values选择索引值,然后使用新列numpy.where

#if possible use dictionary
d = {'bar':1, 'baz':2, 'foo':3}
m = df.index.get_level_values(2)  > df.rename(d).index.get_level_values(0)

df['NEW'] = np.where(m, df.col1, df.col2)

有关更一般的解决方案,请使用Series.rank

a = df.index.get_level_values(2) 
b = df.index.get_level_values(0).to_series().rank(method='dense')

df['NEW'] = np.where(a > b, df.col1, df.col2)

详情:

print (b)
bar    1.0
bar    1.0
bar    1.0
bar    1.0
bar    1.0
bar    1.0
baz    2.0
baz    2.0
baz    2.0
baz    2.0
baz    2.0
baz    2.0
foo    3.0
foo    3.0
foo    3.0
foo    3.0
foo    3.0
foo    3.0
dtype: float64

答案 1 :(得分:2)

我认为你可以使用df.factorize来避免在df.level(0)上循环:

In [40]: thresh = pd.factorize(df.index.get_level_values(0).values)[0] + 1

In [41]: mask = df.index.get_level_values(2) > thresh

In [42]: df['NEW'] = np.where(mask, df.col1, df.col2)

In [43]: df
Out[43]:
               col1      col2       NEW
bar one 1  0.247222 -0.270104 -0.270104
    two 2  0.429196 -1.385352  0.429196
    one 3  0.782293 -1.565623  0.782293
    two 1  0.392214  1.023960  1.023960
    one 2 -1.628410 -0.484275 -1.628410
    two 3  0.256757  0.529373  0.256757
baz one 1 -0.568608 -0.776577 -0.776577
    two 2  2.142408 -0.815413 -0.815413
    one 3  0.860080  0.501965  0.860080
    two 1 -0.267029 -0.025360 -0.025360
    one 2  0.187145 -0.063436 -0.063436
    two 3  0.351296 -2.050649  0.351296
foo one 1  0.704941  0.176698  0.176698
    two 2 -0.380353  1.027745  1.027745
    one 3 -1.337364 -0.568359 -0.568359
    two 1 -0.588601 -0.800426 -0.800426
    one 2  1.513358 -0.616237 -0.616237
    two 3  0.244831  1.027109  1.027109