创建分组的新数据框后添加列

时间:2019-06-29 00:52:08

标签: python pandas dataframe

我有一个大数据框,(印在下面)..它有日期,时间,高,低。每5分钟填充一次行。

我想做的是每天在高栏中找到最大值,然后返回高日期时间。下面的示例仅显示了一天。我必须弄清的第一个问题是找出每天的“高”行,因为有多个相同的“日期”行,但有不同的“时间”和“高”行。因此,我要解决的问题是创建另一个数据框(更多内容在下面)...

        Date   Time   Ticker     Open     High      Low    Close
0     6/3/19   7:05  USD/JPY  108.370  108.370  108.345  108.345
1     6/3/19   7:10  USD/JPY  108.345  108.345  108.325  108.325
2     6/3/19   7:15  USD/JPY  108.330  108.360  108.330  108.340
3     6/3/19   7:20  USD/JPY  108.335  108.335  108.295  108.305
4     6/3/19   7:25  USD/JPY  108.305  108.305  108.270  108.305
5     6/3/19   7:30  USD/JPY  108.300  108.300  108.250  108.260
6     6/3/19   7:35  USD/JPY  108.265  108.295  108.265  108.290
7     6/3/19   7:40  USD/JPY  108.275  108.290  108.250  108.290
8     6/3/19   7:45  USD/JPY  108.285  108.290  108.275  108.290
9     6/3/19   7:50  USD/JPY  108.295  108.350  108.295  108.350
10    6/3/19   7:55  USD/JPY  108.355  108.355  108.325  108.330
11    6/3/19   8:00  USD/JPY  108.335  108.360  108.325  108.350

我尝试了groupby函数将其写入新数据库。首先,我尝试使用最大函数编写对日期进行分组。这给了我最大的机会,并向我显示了日期。...

       Date     High
0   6/10/19  108.670
1   6/11/19  108.800
2   6/12/19  108.545
3   6/13/19  108.535
4   6/14/19  108.500
5   6/17/19  108.690
6   6/18/19  108.675
7   6/19/19  108.495
8   6/20/19  107.760
9   6/21/19  107.735
10  6/24/19  107.530
11   6/3/19  108.445
12   6/4/19  108.355
13   6/5/19  108.340
14   6/6/19  108.330
15   6/7/19  108.500

但是我还想看到最大日期是在那个日期的“时间”行吗?我该如何传递呢?

所需输出示例

Date       Time     High
6/10/19    9:05     108.670
6/11/19    11:35    108.800

'将熊猫作为pd导入

df = pd.read_csv(“〜/ Downloads / file.csv”,编码=“ ISO-8859-1”)

按日期分组的高位

df2 = df.groupby('Date',as_index = False)['High']。max()'

我尝试过

'df2 = df.groupby('Date','Time'as_index = False)['High']。max()'

但是会收到此错误...

df2 = df.groupby('Date','Time' as_index= False)['High'].max()
                                      ^

SyntaxError:语法无效

我只想有一个数据框,其中显示了每天的最大值位于每天的高列时的日期,时间,高。

      Date     High   TIME????????????????????
0   6/10/19  108.670
1   6/11/19  108.800
2   6/12/19  108.545
3   6/13/19  108.535
4   6/14/19  108.500
5   6/17/19  108.690
6   6/18/19  108.675
7   6/19/19  108.495
8   6/20/19  107.760
9   6/21/19  107.735
10  6/24/19  107.530
11   6/3/19  108.445
12   6/4/19  108.355
13   6/5/19  108.340
14   6/6/19  108.330
15   6/7/19  108.500

1 个答案:

答案 0 :(得分:0)

为了说明Date功能,我将groupby列进行了以下更改:

      Date  Time   Ticker     Open     High      Low    Close
0   6/3/19  7:05  USD/JPY  108.370  108.370  108.345  108.345
1   6/3/19  7:10  USD/JPY  108.345  108.345  108.325  108.325
2   6/3/19  7:15  USD/JPY  108.330  108.360  108.330  108.340
3   6/4/19  7:20  USD/JPY  108.335  108.335  108.295  108.305
4   6/4/19  7:25  USD/JPY  108.305  108.305  108.270  108.305
5   6/4/19  7:30  USD/JPY  108.300  108.300  108.250  108.260
6   6/5/19  7:35  USD/JPY  108.265  108.295  108.265  108.290
7   6/5/19  7:40  USD/JPY  108.275  108.290  108.250  108.290
8   6/5/19  7:45  USD/JPY  108.285  108.290  108.275  108.290
9   6/6/19  7:50  USD/JPY  108.295  108.350  108.295  108.350
10  6/6/19  7:55  USD/JPY  108.355  108.355  108.325  108.330
11  6/6/19  8:00  USD/JPY  108.335  108.360  108.325  108.350

您可以尝试:

df.loc[df.groupby('Date')['High'].idxmax()]

这将为您提供:

      Date  Time   Ticker     Open     High      Low    Close
0   6/3/19  7:05  USD/JPY  108.370  108.370  108.345  108.345
3   6/4/19  7:20  USD/JPY  108.335  108.335  108.295  108.305
6   6/5/19  7:35  USD/JPY  108.265  108.295  108.265  108.290
11  6/6/19  8:00  USD/JPY  108.335  108.360  108.325  108.350

然后删除所有不需要的列。