我有带有名称和类型的dataFrame。我需要使用“ Type”-列的班次创建“ newType”-列。 我的dataFrame:
ind name Type
____________________
1 sasha a
2 sasha e
3 sasha d
4 sasha t
5 sasha t
6 sasha w
7 nik e
8 nik e
9 nik q
10 nik t
11 nik h
12 nik j
13 bob k
14 bob y
15 bob r
16 bob w
17 bob t
18 bob w
我需要使用window = n为“类型”-列创建条件为“名称”-列的新列。如果我窗口中的行名称不同,我们将返回NaN。
窗口大小= 3,窗口看起来像这样
[Type [i-1],Type [i],Type [i + 1]]
大小= 4
[Type [i-2],Type [i-1],Type [i],Type [i + 1]]
大小= 5
[Type [i-3],Type [i-2],Type [i-1],Type [i],Type [i + 1]]
...等
窗口的插图= 4: 图片: Algorithm visualization
我需要的结果
ind name Type newType
____________________________
1 sasha a NaN
2 sasha e NaN
3 sasha d aedt
4 sasha t edtt
5 sasha t dttw
6 sasha w NaN
7 nik e NaN
8 nik e NaN
9 nik q eeqt
10 nik t eqth
11 nik h qthj
12 nik j NaN
13 bob k NaN
14 bob y NaN
15 bob r kyrw
16 bob w yrwt
17 bob t rwtw
18 bob w NaN
该怎么做?
答案 0 :(得分:0)
这可能不是以最有效的方式完成的,但是同样,这有点复杂。我试了一段时间使用rolling或扩展函数,但无济于事(因为它们似乎仅适用于数字参数)。这是我获得所需输出的方法:
from io import StringIO
import pandas as pd
import numpy as np
# Make DataFrame
df = pd.read_table(StringIO("""ind name Type
1 sasha a
2 sasha e
3 sasha d
4 sasha t
5 sasha t
6 sasha w
7 nik e
8 nik e
9 nik q
10 nik t
11 nik h
12 nik j
13 bob k
14 bob y
15 bob r
16 bob w
17 bob t
18 bob w"""), sep='\s+')
def joiner(r):
return "-".join(r.values)
df.set_index('name', inplace=True)
# Make new column which join letters aggregated by name
df['full_join'] = df.groupby('name')['Type'].apply(joiner)
df['full_join'].ffill(inplace=True)
df.reset_index(inplace=True)
a = df.full_join.str.split("-",expand=True)
b = []
w = 4 # window
# This part is probably not as efficient as it could be
for i in range(len(a)):
j = i % len(a.iloc[i].str.split('-'))
b.append("".join(a.iloc[i,j-(w-1):j+1].tolist()))
df['result'] = b
df['result'] = df['result'].shift(-1)
df.loc[df['result'] == "", 'result'] = np.nan
df.drop(columns=['full_join'], inplace=True)
结果:
Out[135]:
name ind Type result
0 sasha 1 a NaN
1 sasha 2 e NaN
2 sasha 3 d aedt
3 sasha 4 t edtt
4 sasha 5 t dttw
5 sasha 6 w NaN
6 nik 7 e NaN
7 nik 8 e NaN
8 nik 9 q eeqt
9 nik 10 t eqth
10 nik 11 h qthj
11 nik 12 j NaN
12 bob 13 k NaN
13 bob 14 y NaN
14 bob 15 r kyrw
15 bob 16 w yrwt
16 bob 17 t rwtw
17 bob 18 w NaN
令人惊讶的是,它最终也适用于其他窗口(我已经测试了2、3和5)。不幸的是,在向nik添加一行时失败了:(