您好,
我有一个csv文件,请检查图像中的示例输入csv,我需要获得一个数据帧,其中包含" Amazon Elastic计算云的总和"在特定"可用区域上运行的服务"根据日期对其进行分组。
像这样的东西
|UsageStartDate| AvaliabilityZone | Sum of products used | Total cost for each
[6/1/16, ap-northeast-1a, Amazon Elastic compute cloud = 6, 15$]
[6/2/16, ap-southeast-2 , Amazon Elastic compute cloud = 3, 12$]
这就是我尝试使用熊猫的方式:
funk = pd.read_csv('/tmp/temp.csv')
funk.sort_values('UsageStartDate')
k = funk['AvailabilityZone'][funk['ProductName'] == 'Amazon Elastic Compute Cloud'].sum()
print k
对此有何帮助?我学习大熊猫
以下是数据:
ProductName AvailabilityZone UsageStartDate BlendedCost
0 Amazon Simple Queue Service 6/1/16 0:00 0
1 Alexa Web Information Service 6/1/16 0:00 0.00347032
2 Amazon DynamoDB ap-southeast-2 6/1/16 0:00 0
3 Amazon DynamoDB ap-southeast-2 6/1/16 0:00 0
4 Amazon Elastic Compute Cloud ap-northeast-1a 6/1/16 0:00 0.1
5 Amazon Elastic Compute Cloud ap-northeast-1a 6/1/16 0:00 0.02
6 Amazon Elastic Compute Cloud 6/1/16 0:00 0
7 Amazon Elastic Compute Cloud 6/1/16 0:00 0
8 Amazon Elastic Compute Cloud 6/1/16 0:00 4.70E-06
9 Amazon Elastic Compute Cloud 6/1/16 0:00 8.00E-08
10 Amazon Elastic Compute Cloud 6/1/16 0:00 0.00133333
11 Amazon Elastic Compute Cloud 6/1/16 0:00 0.005
12 Amazon Elastic Compute Cloud ap-southeast-1a 6/1/16 0:00 0.02
13 Amazon Elastic Compute Cloud ap-southeast-1a 6/1/16 0:00 0.02
14 Amazon Elastic Compute Cloud ap-southeast-1b 6/1/16 0:00 0.02
15 Amazon Elastic Compute Cloud 6/1/16 0:00 0
答案 0 :(得分:2)
我认为您需要groupby
aggregate
{len
列AvailabilityZone
和sum
列BlendedCost
:
print (df.groupby(['UsageStartDate', 'AvailabilityZone', 'ProductName'])
.agg({'AvailabilityZone':len,
'BlendedCost':sum}))
样品:
import pandas as pd
raw_data = {
'ProductName': ['ASQS', 'AWIS', 'AWIS', 'AECC', 'AECC'],
'UsageStartDate': ['6/1/16','6/1/16','6/1/16','6/1/16','6/1/16'],
'AvailabilityZone':['ap-northeast-1a','ap-northeast-1a','ap-northeast-1a','ap-southeast-2','ap-southeast-2'],
'BlendedCost':[1,2,3,4,5]}
df = pd.DataFrame(raw_data)
print (df)
AvailabilityZone BlendedCost ProductName UsageStartDate
0 ap-northeast-1a 1 ASQS 6/1/16
1 ap-northeast-1a 2 AWIS 6/1/16
2 ap-northeast-1a 3 AWIS 6/1/16
3 ap-southeast-2 4 AECC 6/1/16
4 ap-southeast-2 5 AECC 6/1/16
print (df.groupby(['UsageStartDate', 'AvailabilityZone', 'ProductName'])
.agg({'AvailabilityZone':len,'BlendedCost':sum})
.rename(columns={'AvailabilityZone':'Sum of products used', 'BlendedCost':'Total'})
.reset_index())
UsageStartDate AvailabilityZone ProductName Sum of products used Total
0 6/1/16 ap-northeast-1a ASQS 1 1
1 6/1/16 ap-northeast-1a AWIS 2 5
2 6/1/16 ap-southeast-2 AECC 2 9
样本数据解决方案:
import pandas as pd
import io
temp=u"""ProductName;AvailabilityZone;UsageStartDate;BlendedCost
Amazon Simple Queue Service;;6/1/16 0:00;0
Alexa Web Information Service;;6/1/16 0:00;0.00347032
Amazon DynamoDB;ap-southeast-2;6/1/16 0:00;0
Amazon DynamoDB;ap-southeast-2;6/1/16 0:00;0
Amazon Elastic Compute Cloud;ap-northeast-1a;6/1/16 0:00;0.1
Amazon Elastic Compute Cloud;ap-northeast-1a;6/1/16 0:00;0.02
Amazon Elastic Compute Cloud;;6/1/16 0:00;0
Amazon Elastic Compute Cloud;;6/1/16 0:00;0
Amazon Elastic Compute Cloud;;6/1/16 0:00;4.70E-06
Amazon Elastic Compute Cloud;;6/1/16 0:00;8.00E-08
Amazon Elastic Compute Cloud;;6/1/16 0:00;0.00133333
Amazon Elastic Compute Cloud;;6/1/16 0:00;0.005
Amazon Elastic Compute Cloud;ap-southeast-1a;6/1/16 0:00;0.02
Amazon Elastic Compute Cloud;ap-southeast-1a;6/1/16 0:00;0.02
Amazon Elastic Compute Cloud;ap-southeast-1b;6/1/16 0:00;0.02
Amazon Elastic Compute Cloud;;6/1/16 0:00;0"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep=";", index_col=None
#print (df)
print (df.groupby(['UsageStartDate', 'AvailabilityZone', 'ProductName'])
.agg({'AvailabilityZone':len,'BlendedCost':sum})
.rename(columns={'AvailabilityZone':'Sum of products used', 'BlendedCost':'Total'})
.reset_index())
UsageStartDate AvailabilityZone ProductName \
0 6/1/16 0:00 ap-northeast-1a Amazon Elastic Compute Cloud
1 6/1/16 0:00 ap-southeast-1a Amazon Elastic Compute Cloud
2 6/1/16 0:00 ap-southeast-1b Amazon Elastic Compute Cloud
3 6/1/16 0:00 ap-southeast-2 Amazon DynamoDB
Sum of products used Total
0 2 0.12
1 2 0.04
2 1 0.02
3 2 0.00
答案 1 :(得分:-2)
以下是general aggregation framework for pandas和pandas.groupby功能
上的文档将来,请阅读如何提问great question before asking!
funk.groupby(['AvailabilityZone','Date','ProductName'])['BlendedCost'].sum()