我正在使用一个包含四列的大型excel文件,但我只需要两个:Date和HPCP。此程序的目标是将日期转换为日期对象,删除重复日期,然后汇总重复项的HPCP。我觉得这段代码应该可行,但是输出非常错误。代码成功地将日期转换为日期对象,删除重复项,但不能正确汇总。任何帮助将不胜感激。
链接到excel文件: https://drive.google.com/open?id=1P5-k9Zyz8iFwx6Y-9yhnRozGGSvqpXLz
excel文件中的一些行示例:
STATION STATION_NAME DATE HPCP
COOP:305801 NY CITY CENTRAL PARK NY US 20000101 01:00 0
COOP:305801 NY CITY CENTRAL PARK NY US 20000104 15:00 0
COOP:305801 NY CITY CENTRAL PARK NY US 20000104 16:00 0.01
COOP:305801 NY CITY CENTRAL PARK NY US 20000104 17:00 0.03
COOP:305801 NY CITY CENTRAL PARK NY US 20000104 18:00 0.04
COOP:305801 NY CITY CENTRAL PARK NY US 20000104 19:00 0.12
COOP:305801 NY CITY CENTRAL PARK NY US 20000104 20:00 0.17
COOP:305801 NY CITY CENTRAL PARK NY US 20000104 21:00 0.13
COOP:305801 NY CITY CENTRAL PARK NY US 20000104 22:00 0.04
COOP:305801 NY CITY CENTRAL PARK NY US 20000104 23:00 0.09
COOP:305801 NY CITY CENTRAL PARK NY US 20000105 00:00 0.07
COOP:305801 NY CITY CENTRAL PARK NY US 20000105 01:00 0
COOP:305801 NY CITY CENTRAL PARK NY US 20000109 21:00 0.01
COOP:305801 NY CITY CENTRAL PARK NY US 20000109 22:00 0
COOP:305801 NY CITY CENTRAL PARK NY US 20000110 00:00 0.01
COOP:305801 NY CITY CENTRAL PARK NY US 20000110 13:00 0.15
COOP:305801 NY CITY CENTRAL PARK NY US 20000110 14:00 0.29
COOP:305801 NY CITY CENTRAL PARK NY US 20000110 15:00 0.24
COOP:305801 NY CITY CENTRAL PARK NY US 20000110 16:00 0.15
COOP:305801 NY CITY CENTRAL PARK NY US 20000110 17:00 0.01
COOP:305801 NY CITY CENTRAL PARK NY US 20000113 08:00 0
COOP:305801 NY CITY CENTRAL PARK NY US 20000113 09:00 0.01
COOP:305801 NY CITY CENTRAL PARK NY US 20000113 10:00 0.02
COOP:305801 NY CITY CENTRAL PARK NY US 20000113 15:00 0.01
COOP:305801 NY CITY CENTRAL PARK NY US 20000113 16:00 0.01
COOP:305801 NY CITY CENTRAL PARK NY US 20000113 17:00 0
COOP:305801 NY CITY CENTRAL PARK NY US 20000120 07:00 0
COOP:305801 NY CITY CENTRAL PARK NY US 20000120 08:00 0
COOP:305801 NY CITY CENTRAL PARK NY US 20000120 09:00 0
代码:
import sys
import pandas as pd
import datetime
data = pd.read_csv(sys.argv[1])
data = data[['DATE','HPCP']]
data['DATE'] = pd.to_datetime(data['DATE'])
for index, row in data.iterrows():
print index
data.loc[index,'DATE'] = data.loc[index,'DATE'].date()
data = data.groupby(['DATE'],as_index=False).sum()
print data
输出:
DATE HPCP
0 2000-01-01 11999.88
1 2000-01-03 0.00
2 2000-01-04 1002.97
3 2000-01-05 1.25
4 2000-01-09 1000.01
5 2000-01-10 4.72
6 2000-01-11 0.00
7 2000-01-13 0.17
8 2000-01-16 0.00
9 2000-01-20 1000.11
10 2000-01-21 0.12
... ...
2871 2013-12-17 0.66
2872 2013-12-21 0.01
2873 2013-12-22 0.04
2874 2013-12-23 2.06
2875 2013-12-24 0.00
2876 2013-12-26 0.00
2877 2013-12-29 4.90
2878 2013-12-30 0.00
2879 2013-12-31 0.00
2880 2014-01-01 3999.96
答案 0 :(得分:0)
没错,那些大价值是正确的。我将您的数据文件导入Excel,使用每行的数据和HPCP的总和作为值创建了一个数据透视表,这里是前几个结果:
Row Labels Sum of HPCP
1/1/2000 11999.88
1/3/2000 0
1/4/2000 1002.97
1/5/2000 1.25
1/9/2000 1000.01
...
好消息是您的代码很好。
为了阻止对你的问题的评论中的长时间讨论,我只是说你需要区分"令人惊讶的"来自"错误"。这些结果令人惊讶"鉴于HPCP列中的值通常较小,但它们并非“错误”#34;。也许您想要一个不同的指标(平均值?max?)或者您想要进行一些预过滤,但对于您已经给出的数据以及您想要做的描述,您的代码及其结果是正确的,即使输出值是意外的。
答案 1 :(得分:0)
.csv文件中有许多行,您链接到Glide
.with(context)
.load(yourImageUrl)
.override(200, 200)
.into(imageView);
的999.99。您的总和正在为此数据正常工作。