根据条件转换数据帧的列

时间:2018-01-31 23:44:00

标签: python python-3.x pandas group-by

我愿意在rap等于1时添加一个连续三个1的新列。连续三个1必须在同一年,当说唱等于一,而前两个是从那一个。新列必须是id(我有一个数据面板)。

df看起来像这样:

id  year  rap  cohort  jobs  year_of_life  
1  2009    0     NaN      10      NaN       
1  2012    0     2012     12      0         
1  2013    0     2012     12      1         
1  2014    0     2012     13      2         
1  2015    1     2012     15      3         
1  2016    0     2012     17      4       
1  2017    0     2012     18      5         
2  2009    0     2009     15      0         
2  2010    0     2009     2       1         
2  2011    0     2009     3       2         
2  2012    1     2009     3       3         
2  2013    0     2009     15      4         
2  2014    0     2009     12      5         
2  2015    0     2009     13      6         
2  2016    0     2009     13      7         

预期产出:

id  year  rap  cohort  jobs  year_of_life  rap_new
1  2009    0     NaN      10      NaN       0  
1  2012    0     2012     12      0         0   
1  2013    0     2012     12      1         1
1  2014    0     2012     13      2         1
1  2015    1     2012     15      3         1
1  2016    0     2012     17      4         0
1  2017    0     2012     18      5         0
2  2009    0     2009     15      0         0
2  2010    0     2009     2       1         1
2  2011    0     2009     3       2         1
2  2012    1     2009     3       3         1
2  2013    0     2009     15      4         0
2  2014    0     2009     12      5         0
2  2015    0     2009     13      6         0
2  2016    0     2009     13      7         0

2 个答案:

答案 0 :(得分:2)

这是一种方式。

# calculate rap_new indices
rap_indices = [i for i, j in enumerate(df.rap) if j==1]
rap_new_indices = list(set.union(*[set(range(n-2, n+1)) for n in rap_indices]))

# apply indices to new col
df.rap_new = 0
df.loc[rap_new_indices, 'rap_new'] = 1

#     id  year  rap  cohort  jobs  year_of_life  rap_new
# 0    1  2009    0     NaN    10           NaN        0
# 1    1  2012    0  2012.0    12           0.0        0
# 2    1  2013    0  2012.0    12           1.0        1
# 3    1  2014    0  2012.0    13           2.0        1
# 4    1  2015    1  2012.0    15           3.0        1
# 5    1  2016    0  2012.0    17           4.0        0
# 6    1  2017    0  2012.0    18           5.0        0
# 7    2  2009    0  2009.0    15           0.0        0
# 8    2  2010    0  2009.0     2           1.0        1
# 9    2  2011    0  2009.0     3           2.0        1
# 10   2  2012    1  2009.0     3           3.0        1
# 11   2  2013    0  2009.0    15           4.0        0
# 12   2  2014    0  2009.0    12           5.0        0
# 13   2  2015    0  2009.0    13           6.0        0
# 14   2  2016    0  2009.0    13           7.0        0

答案 1 :(得分:2)

选项1
使用pd.Series.shift

变得棘手
df.assign(
    rap_new=sum(df.rap.shift(-i).fillna(0, downcast='infer') for i in range(3)))

    id  year  rap  cohort  jobs  year_of_life  rap_new
0    1  2009    0     NaN    10           NaN        0
1    1  2012    0  2012.0    12           0.0        0
2    1  2013    0  2012.0    12           1.0        1
3    1  2014    0  2012.0    13           2.0        1
4    1  2015    1  2012.0    15           3.0        1
5    1  2016    0  2012.0    17           4.0        0
6    1  2017    0  2012.0    18           5.0        0
7    2  2009    0  2009.0    15           0.0        0
8    2  2010    0  2009.0     2           1.0        1
9    2  2011    0  2009.0     3           2.0        1
10   2  2012    1  2009.0     3           3.0        1
11   2  2013    0  2009.0    15           4.0        0
12   2  2014    0  2009.0    12           5.0        0
13   2  2015    0  2009.0    13           6.0        0
14   2  2016    0  2009.0    13           7.0        0

选项2
实验
不要用这个!我只是玩得开心。

from numpy.lib.stride_tricks import as_strided as strides

a = df.rap.values
s = a.strides[0]

df.assign(rap_new=strides(np.append(a, [0, 0]), (a.shape[0], 3), (s, s)).sum(1))

    id  year  rap  cohort  jobs  year_of_life  rap_new
0    1  2009    0     NaN    10           NaN        0
1    1  2012    0  2012.0    12           0.0        0
2    1  2013    0  2012.0    12           1.0        1
3    1  2014    0  2012.0    13           2.0        1
4    1  2015    1  2012.0    15           3.0        1
5    1  2016    0  2012.0    17           4.0        0
6    1  2017    0  2012.0    18           5.0        0
7    2  2009    0  2009.0    15           0.0        0
8    2  2010    0  2009.0     2           1.0        1
9    2  2011    0  2009.0     3           2.0        1
10   2  2012    1  2009.0     3           3.0        1
11   2  2013    0  2009.0    15           4.0        0
12   2  2014    0  2009.0    12           5.0        0
13   2  2015    0  2009.0    13           6.0        0
14   2  2016    0  2009.0    13           7.0        0