我有一个包含4列的数据框:天,时间, tmin 和 tmax 。 tmin显示当天的 temperature_min ,tmax显示当天的 temperature_max 。 我想要的是能够用当天的tmin和tmax填充一天的所有NaN值。例如,我要转换此数据框:
day time tmin tmax
0 01 00:00:00 NaN NaN
1 01 03:00:00 -6.8 NaN
2 01 06:00:00 NaN NaN
3 01 09:00:00 NaN NaN
4 01 12:00:00 NaN NaN
5 01 15:00:00 NaN 1.2
6 01 18:00:00 NaN NaN
7 01 21:00:00 NaN NaN
8 02 00:00:00 NaN NaN
9 02 03:00:00 -7.2 NaN
10 02 06:00:00 NaN NaN
11 02 09:00:00 NaN NaN
12 02 12:00:00 NaN NaN
13 02 15:00:00 NaN 1.8
14 02 18:00:00 NaN NaN
15 02 21:00:00 NaN NaN
此数据框:
day time tmin tmax
0 01 00:00:00 -6.8 1.2
1 01 03:00:00 -6.8 1.2
2 01 06:00:00 -6.8 1.2
3 01 09:00:00 -6.8 1.2
4 01 12:00:00 -6.8 1.2
5 01 15:00:00 -6.8 1.2
6 01 18:00:00 -6.8 1.2
7 01 21:00:00 -6.8 1.2
8 02 00:00:00 -7.2 1.8
9 02 03:00:00 -7.2 1.8
10 02 06:00:00 -7.2 1.8
11 02 09:00:00 -7.2 1.8
12 02 12:00:00 -7.2 1.8
13 02 15:00:00 -7.2 1.8
14 02 18:00:00 -7.2 1.8
15 02 21:00:00 -7.2 1.8
答案 0 :(得分:3)
使用groupby
和transform
:
df.assign(**df.groupby('day')[['tmin', 'tmax']].transform('first'))
day time tmin tmax
0 1 00:00:00 -6.8 1.2
1 1 03:00:00 -6.8 1.2
2 1 06:00:00 -6.8 1.2
3 1 09:00:00 -6.8 1.2
4 1 12:00:00 -6.8 1.2
5 1 15:00:00 -6.8 1.2
6 1 18:00:00 -6.8 1.2
7 1 21:00:00 -6.8 1.2
8 2 00:00:00 -7.2 1.8
9 2 03:00:00 -7.2 1.8
10 2 06:00:00 -7.2 1.8
11 2 09:00:00 -7.2 1.8
12 2 12:00:00 -7.2 1.8
13 2 15:00:00 -7.2 1.8
14 2 18:00:00 -7.2 1.8
15 2 21:00:00 -7.2 1.8
或者,如果您想修改原始DataFrame而不是返回副本:
df[['tmin', 'tmax']] = df.groupby('day')[['tmin', 'tmax']].transform('first')
答案 1 :(得分:1)
只需将https://angular.io/guide/component-interaction与前向填充和后向填充参数一起使用:
df.tmin = df.groupby('day')['tmin'].fillna(method='ffill').fillna(method='bfill')
df.tmax = df.groupby('day')['tmax'].fillna(method='ffill').fillna(method='bfill')
答案 2 :(得分:1)
如果您不希望这样做像@ user3483203那样整洁!
import pandas as pd
myfile = pd.read_csv('temperature.txt', sep=' ')
mydata = pd.DataFrame(data = myfile)
for i in mydata['day']:
row_start = (i-1) * 8 # assuming 8 data points per day
row_end = (i) * 8
mydata['tmin'][row_start:row_end] = pd.DataFrame.min(tempdata['tmin'][row_start:row_end], skipna=True)
mydata['tmax'][row_start:row_end] = pd.DataFrame.max(tempdata['tmax'][row_start:row_end], skipna=True)
答案 3 :(得分:0)
由于您未发布任何代码,因此,这是一个常规解决方案:
Step 1: Create variables that will keep track of the min and max temps
Step 2: Loop through each row in the frame
Step 3: For each row, check if the min or max == "NaN"
Step 4: If it is, replace with the value of the min or max variable we created earlier