这是我的数据框:
id Year Month Day Instant Temperature DayType DayValidity LoadNette
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
195 2008 4 5 3 8.07 6 1 51907.94746
196 2008 5 5 4 8.03 6 1 50848.16566
并且我希望复制5次我的数据帧,但是根据像这样的月份为某些行提供wieghts,例如,月份为4的行将仅复制3次,而第4次只有2次喜欢使用python:< / p>
id Year Month Day Instant Temperature DayType DayValidity LoadNette
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
195 2008 4 5 3 8.07 6 1 51907.94746
196 2008 5 5 4 8.03 6 1 50848.16566
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
195 2008 4 5 3 8.07 6 1 51907.94746
196 2008 5 5 4 8.03 6 1 50848.16566
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
195 2008 4 5 3 8.07 6 1 51907.94746
196 2008 5 5 4 8.03 6 1 50848.16566
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
195 2008 4 5 3 8.07 6 1 51907.94746
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
有任何方法可以做到这一点
答案 0 :(得分:3)
您可以使用dict
来重复numpy.repeat
和dict理解的次数:
d = {1:5, 2:2, 3:1, 4:3, 5:3}
l = df['Month'].map(d)
df = pd.DataFrame({col: np.repeat(df[col], l) for col in df.columns}, columns=df.columns)
print (df)
id Year Month Day Instant Temperature DayType DayValidity \
0 192 2008 1 5 0 8.03 6 1
0 192 2008 1 5 0 8.03 6 1
0 192 2008 1 5 0 8.03 6 1
0 192 2008 1 5 0 8.03 6 1
0 192 2008 1 5 0 8.03 6 1
1 193 2008 2 5 1 8.07 6 1
1 193 2008 2 5 1 8.07 6 1
2 194 2008 3 5 2 8.10 6 1
3 195 2008 4 5 3 8.07 6 1
3 195 2008 4 5 3 8.07 6 1
3 195 2008 4 5 3 8.07 6 1
4 196 2008 5 5 4 8.03 6 1
4 196 2008 5 5 4 8.03 6 1
4 196 2008 5 5 4 8.03 6 1
LoadNette
0 53039.77133
0 53039.77133
0 53039.77133
0 53039.77133
0 53039.77133
1 52200.71569
1 52200.71569
2 51681.17260
3 51907.94746
3 51907.94746
3 51907.94746
4 50848.16566
4 50848.16566
4 50848.16566
另一种解决方案,如果需要使用concat
重复所有行5次:
df = pd.concat([df] * 5, ignore_index=True)
print (df)
id Year Month Day Instant Temperature DayType DayValidity \
0 192 2008 1 5 0 8.03 6 1
1 193 2008 2 5 1 8.07 6 1
2 194 2008 3 5 2 8.10 6 1
3 195 2008 4 5 3 8.07 6 1
4 196 2008 5 5 4 8.03 6 1
5 192 2008 1 5 0 8.03 6 1
6 193 2008 2 5 1 8.07 6 1
7 194 2008 3 5 2 8.10 6 1
8 195 2008 4 5 3 8.07 6 1
9 196 2008 5 5 4 8.03 6 1
10 192 2008 1 5 0 8.03 6 1
11 193 2008 2 5 1 8.07 6 1
12 194 2008 3 5 2 8.10 6 1
13 195 2008 4 5 3 8.07 6 1
14 196 2008 5 5 4 8.03 6 1
15 192 2008 1 5 0 8.03 6 1
16 193 2008 2 5 1 8.07 6 1
17 194 2008 3 5 2 8.10 6 1
18 195 2008 4 5 3 8.07 6 1
19 196 2008 5 5 4 8.03 6 1
20 192 2008 1 5 0 8.03 6 1
21 193 2008 2 5 1 8.07 6 1
22 194 2008 3 5 2 8.10 6 1
23 195 2008 4 5 3 8.07 6 1
24 196 2008 5 5 4 8.03 6 1
LoadNette
0 53039.77133
1 52200.71569
2 51681.17260
3 51907.94746
4 50848.16566
5 53039.77133
6 52200.71569
7 51681.17260
8 51907.94746
9 50848.16566
10 53039.77133
11 52200.71569
12 51681.17260
13 51907.94746
14 50848.16566
15 53039.77133
16 52200.71569
17 51681.17260
18 51907.94746
19 50848.16566
20 53039.77133
21 52200.71569
22 51681.17260
23 51907.94746
24 50848.16566
答案 1 :(得分:1)
将pandas.sample
function用于权重。语法:
#vec = <vector of rows weights>
df.sample(weights = vec)