根据以下示例,我有一个数据框架:
key1 key2 value1 1 201501 NaN 1 201502 NaN 1 201503 201503 1 201504 NaN 2 201507 NaN 2 201508 NaN 2 201509 NaN 3 201509 NaN 3 201510 201509 3 201511 NaN 3 201512 NaN 3 201513 NaN
我想要以下输出......
key1 key2 value1 value2 1 201501 NaN 0 1 201502 NaN 0 1 201503 201503 1 1 201504 NaN 1 2 201507 NaN 0 2 201508 NaN 0 2 201509 NaN 0 3 201509 NaN 0 3 201510 201509 1 3 201511 NaN 1 3 201512 NaN 1 3 201601 NaN 1
输出只是一个二进制标志,如果 value1 在 value1 中有yyyymm-stamp,则会接受 value1 ,然后保留它以提示其key1-组。在前面的行中,它应该是0.如果 key1 只有 np.NaN ,那么它应该是0,就像 key1 = 2一样。
我尝试过使用lambda运算符的应用程序,但它真的很慢。我希望有人可以给我一个关于如何使用更加矢量化的方法来广播它的提示,以节省一些执行时间。
以下df的代码!
非常感谢时间和投入!
致以最诚挚的问候,
/ swepab
import numpy as np
df = pd.DataFrame({'key1' : [1,1,1,1,2,2,2,3,3,3,3,3]
,'key2' : [201501, 201502,201503,201504,201507,201508,201509,201509,201510,201511,201512,201601]
,'value1' : [np.nan,np.nan,'201503',np.nan,np.nan,np.nan,np.nan,np.nan,'201509',np.nan,np.nan,np.nan]
,'value2' : [0,0,1,1,0,0,0,0,1,1,1,1]})
答案 0 :(得分:0)
你需要的IIUC ffill
:
df['value2'] = df.groupby('key1')['value1'].ffill()
df.value2 = np.where(df.value2.notnull(),1,0)
print (df)
key1 key2 value1 value2
0 1 201501 NaN 0
1 1 201502 NaN 0
2 1 201503 201503 1
3 1 201504 NaN 1
4 2 201507 NaN 0
5 2 201508 NaN 0
6 2 201509 NaN 0
7 3 201509 NaN 0
8 3 201510 201509 1
9 3 201511 NaN 1
10 3 201512 NaN 1
11 3 201601 NaN 1
答案 1 :(得分:0)
你可以这样做:
df['value2'] = df.groupby('key1')['value1'].apply(lambda x: (~pd.isnull(x)).cumsum())
In [50]: df
Out[50]:
key1 key2 value1 value2
0 1 201501 NaN 0
1 1 201502 NaN 0
2 1 201503 201503 1
3 1 201504 NaN 1
4 2 201507 NaN 0
5 2 201508 NaN 0
6 2 201509 NaN 0
7 3 201509 NaN 0
8 3 201510 201509 1
9 3 201511 NaN 1
10 3 201512 NaN 1
11 3 201601 NaN 1