如何根据条件从另一行的字符串创建新的df列?

时间:2018-08-03 04:14:29

标签: python pandas dataframe

我有带有名称和类型的dataFrame。我需要使用“ Type”-列的班次创建“ newType”-列。 我的dataFrame:

ind   name    Type
____________________
1     sasha   a      
2     sasha   e
3     sasha   d
4     sasha   t
5     sasha   t
6     sasha   w
7     nik     e
8     nik     e
9     nik     q
10    nik     t
11    nik     h
12    nik     j
13    bob     k
14    bob     y
15    bob     r
16    bob     w
17    bob     t
18    bob     w

我需要使用window = n为“类型”-列创建条件为“名称”-列的新列。如果我窗口中的行名称不同,我们将返回NaN。

窗口大小= 3,窗口看起来像这样

[Type [i-1],Type [i],Type [i + 1]]

大小= 4

[Type [i-2],Type [i-1],Type [i],Type [i + 1]]

大小= 5

[Type [i-3],Type [i-2],Type [i-1],Type [i],Type [i + 1]]

...等

窗口的插图= 4: 图片: Algorithm visualization

我需要的结果

ind   name    Type   newType
____________________________
1     sasha   a       NaN 
2     sasha   e       NaN
3     sasha   d       aedt
4     sasha   t       edtt
5     sasha   t       dttw
6     sasha   w       NaN
7     nik     e       NaN
8     nik     e       NaN
9     nik     q       eeqt
10    nik     t       eqth
11    nik     h       qthj
12    nik     j       NaN
13    bob     k       NaN
14    bob     y       NaN
15    bob     r       kyrw
16    bob     w       yrwt
17    bob     t       rwtw
18    bob     w       NaN

该怎么做?

1 个答案:

答案 0 :(得分:0)

这可能不是以最有效的方式完成的,但是同样,这有点复杂。我试了一段时间使用rolling或扩展函数,但无济于事(因为它们似乎仅适用于数字参数)。这是我获得所需输出的方法:

from io import StringIO
import pandas as pd
import numpy as np
# Make DataFrame
df = pd.read_table(StringIO("""ind   name    Type
1     sasha   a      
2     sasha   e
3     sasha   d
4     sasha   t
5     sasha   t
6     sasha   w
7     nik     e
8     nik     e
9     nik     q
10    nik     t
11    nik     h
12    nik     j
13    bob     k
14    bob     y
15    bob     r
16    bob     w
17    bob     t
18    bob     w"""), sep='\s+')
def joiner(r):
    return "-".join(r.values)
df.set_index('name', inplace=True)
# Make new column which join letters aggregated by name
df['full_join'] = df.groupby('name')['Type'].apply(joiner)
df['full_join'].ffill(inplace=True)
df.reset_index(inplace=True)
a = df.full_join.str.split("-",expand=True)
b = []
w = 4 # window
# This part is probably not as efficient as it could be
for i in range(len(a)):
    j = i % len(a.iloc[i].str.split('-'))
    b.append("".join(a.iloc[i,j-(w-1):j+1].tolist()))
df['result'] = b
df['result'] = df['result'].shift(-1)
df.loc[df['result'] == "", 'result'] = np.nan
df.drop(columns=['full_join'], inplace=True)

结果:

Out[135]: 
     name  ind Type result
0   sasha    1    a    NaN
1   sasha    2    e    NaN
2   sasha    3    d   aedt
3   sasha    4    t   edtt
4   sasha    5    t   dttw
5   sasha    6    w    NaN
6     nik    7    e    NaN
7     nik    8    e    NaN
8     nik    9    q   eeqt
9     nik   10    t   eqth
10    nik   11    h   qthj
11    nik   12    j    NaN
12    bob   13    k    NaN
13    bob   14    y    NaN
14    bob   15    r   kyrw
15    bob   16    w   yrwt
16    bob   17    t   rwtw
17    bob   18    w    NaN

令人惊讶的是,它最终也适用于其他窗口(我已经测试了2、3和5)。不幸的是,在向nik添加一行时失败了:(