我有一个大数据框,(印在下面)..它有日期,时间,高,低。每5分钟填充一次行。
我想做的是每天在高栏中找到最大值,然后返回高日期时间。下面的示例仅显示了一天。我必须弄清的第一个问题是找出每天的“高”行,因为有多个相同的“日期”行,但有不同的“时间”和“高”行。因此,我要解决的问题是创建另一个数据框(更多内容在下面)...
Date Time Ticker Open High Low Close
0 6/3/19 7:05 USD/JPY 108.370 108.370 108.345 108.345
1 6/3/19 7:10 USD/JPY 108.345 108.345 108.325 108.325
2 6/3/19 7:15 USD/JPY 108.330 108.360 108.330 108.340
3 6/3/19 7:20 USD/JPY 108.335 108.335 108.295 108.305
4 6/3/19 7:25 USD/JPY 108.305 108.305 108.270 108.305
5 6/3/19 7:30 USD/JPY 108.300 108.300 108.250 108.260
6 6/3/19 7:35 USD/JPY 108.265 108.295 108.265 108.290
7 6/3/19 7:40 USD/JPY 108.275 108.290 108.250 108.290
8 6/3/19 7:45 USD/JPY 108.285 108.290 108.275 108.290
9 6/3/19 7:50 USD/JPY 108.295 108.350 108.295 108.350
10 6/3/19 7:55 USD/JPY 108.355 108.355 108.325 108.330
11 6/3/19 8:00 USD/JPY 108.335 108.360 108.325 108.350
我尝试了groupby函数将其写入新数据库。首先,我尝试使用最大函数编写对日期进行分组。这给了我最大的机会,并向我显示了日期。...
Date High
0 6/10/19 108.670
1 6/11/19 108.800
2 6/12/19 108.545
3 6/13/19 108.535
4 6/14/19 108.500
5 6/17/19 108.690
6 6/18/19 108.675
7 6/19/19 108.495
8 6/20/19 107.760
9 6/21/19 107.735
10 6/24/19 107.530
11 6/3/19 108.445
12 6/4/19 108.355
13 6/5/19 108.340
14 6/6/19 108.330
15 6/7/19 108.500
但是我还想看到最大日期是在那个日期的“时间”行吗?我该如何传递呢?
所需输出示例
Date Time High
6/10/19 9:05 108.670
6/11/19 11:35 108.800
'将熊猫作为pd导入
df = pd.read_csv(“〜/ Downloads / file.csv”,编码=“ ISO-8859-1”)
df2 = df.groupby('Date',as_index = False)['High']。max()'
'df2 = df.groupby('Date','Time'as_index = False)['High']。max()'
但是会收到此错误...
df2 = df.groupby('Date','Time' as_index= False)['High'].max()
^
SyntaxError:语法无效
我只想有一个数据框,其中显示了每天的最大值位于每天的高列时的日期,时间,高。
Date High TIME????????????????????
0 6/10/19 108.670
1 6/11/19 108.800
2 6/12/19 108.545
3 6/13/19 108.535
4 6/14/19 108.500
5 6/17/19 108.690
6 6/18/19 108.675
7 6/19/19 108.495
8 6/20/19 107.760
9 6/21/19 107.735
10 6/24/19 107.530
11 6/3/19 108.445
12 6/4/19 108.355
13 6/5/19 108.340
14 6/6/19 108.330
15 6/7/19 108.500
答案 0 :(得分:0)
为了说明Date
功能,我将groupby
列进行了以下更改:
Date Time Ticker Open High Low Close
0 6/3/19 7:05 USD/JPY 108.370 108.370 108.345 108.345
1 6/3/19 7:10 USD/JPY 108.345 108.345 108.325 108.325
2 6/3/19 7:15 USD/JPY 108.330 108.360 108.330 108.340
3 6/4/19 7:20 USD/JPY 108.335 108.335 108.295 108.305
4 6/4/19 7:25 USD/JPY 108.305 108.305 108.270 108.305
5 6/4/19 7:30 USD/JPY 108.300 108.300 108.250 108.260
6 6/5/19 7:35 USD/JPY 108.265 108.295 108.265 108.290
7 6/5/19 7:40 USD/JPY 108.275 108.290 108.250 108.290
8 6/5/19 7:45 USD/JPY 108.285 108.290 108.275 108.290
9 6/6/19 7:50 USD/JPY 108.295 108.350 108.295 108.350
10 6/6/19 7:55 USD/JPY 108.355 108.355 108.325 108.330
11 6/6/19 8:00 USD/JPY 108.335 108.360 108.325 108.350
您可以尝试:
df.loc[df.groupby('Date')['High'].idxmax()]
这将为您提供:
Date Time Ticker Open High Low Close
0 6/3/19 7:05 USD/JPY 108.370 108.370 108.345 108.345
3 6/4/19 7:20 USD/JPY 108.335 108.335 108.295 108.305
6 6/5/19 7:35 USD/JPY 108.265 108.295 108.265 108.290
11 6/6/19 8:00 USD/JPY 108.335 108.360 108.325 108.350
然后删除所有不需要的列。