我有一个数据框,看起来像:
I_Code Date_1 Date_2 s_count
FT-35447 01/09/2019 02/08/2019 6
FT-40664 01/09/2019 02/08/2019 6
FT-54185 01/09/2019 03/08/2019 3
FT-40664 01/09/2019 03/08/2019 3
FT-56984 02/09/2019 03/08/2019 3
FT-29238 02/09/2019 03/08/2019 3
FT-45919 02/09/2019 03/08/2019 3
FT-35447 01/09/2019 04/08/2019 2
FT-56984 02/09/2019 04/08/2019 2
FT-89801 02/09/2019 04/08/2019 2
FT-29238 02/09/2019 04/08/2019 2
FT-70293 03/09/2019 04/08/2019 2
我想创建一个新的数据框,该框将具有相同的以下字段以及一个新字段,该字段的随机数介于1到100之间,但行数取决于s_count。例如,第一个条目将具有6行,第二个条目将具有6行,第3个条目将具有3行,依此类推
df中第1行的预期输出:
I_Code Date_1 Date_2 s_count num
FT-35447 01/09/2019 02/08/2019 6 10
FT-35447 01/09/2019 02/08/2019 6 13
FT-35447 01/09/2019 02/08/2019 6 56
FT-35447 01/09/2019 02/08/2019 6 45
FT-35447 01/09/2019 02/08/2019 6 34
FT-35447 01/09/2019 02/08/2019 6 90
有没有实现相同目标的方法。
谢谢
答案 0 :(得分:2)
将Index.repeat
与DataFrame.loc
一起用于重复的行,然后通过numpy.random.randint
设置新的列值:
df = df.loc[df.index.repeat(df['s_count'])].reset_index(drop=True)
df['num'] = np.random.randint(1, 100, size=len(df))
print (df.head(20))
I_Code Date_1 Date_2 s_count num
0 FT-35447 01/09/2019 02/08/2019 6 83
1 FT-35447 01/09/2019 02/08/2019 6 84
2 FT-35447 01/09/2019 02/08/2019 6 11
3 FT-35447 01/09/2019 02/08/2019 6 83
4 FT-35447 01/09/2019 02/08/2019 6 90
5 FT-35447 01/09/2019 02/08/2019 6 12
6 FT-40664 01/09/2019 02/08/2019 6 33
7 FT-40664 01/09/2019 02/08/2019 6 69
8 FT-40664 01/09/2019 02/08/2019 6 11
9 FT-40664 01/09/2019 02/08/2019 6 29
10 FT-40664 01/09/2019 02/08/2019 6 46
11 FT-40664 01/09/2019 02/08/2019 6 44
12 FT-54185 01/09/2019 03/08/2019 3 92
13 FT-54185 01/09/2019 03/08/2019 3 46
14 FT-54185 01/09/2019 03/08/2019 3 45
15 FT-40664 01/09/2019 03/08/2019 3 87
16 FT-40664 01/09/2019 03/08/2019 3 88
17 FT-40664 01/09/2019 03/08/2019 3 78
18 FT-56984 02/09/2019 03/08/2019 3 89
19 FT-56984 02/09/2019 03/08/2019 3 18