我正在使用Postgres 9.5,并尝试使用l=[{x:y.tolist()}for x , y in df.groupby('IssuerID')['Sedol']]
l
[{1: [1, 2]}, {2: [3]}, {3: [4]}]
计算中位数和平均 每单位价格 ID 。这是DBFIDDLE
这里是数据
GROUP BY
使用id | price | units
-----+-------+--------
1 | 100 | 15
1 | 90 | 10
1 | 50 | 8
1 | 40 | 8
1 | 30 | 7
2 | 110 | 22
2 | 60 | 8
2 | 50 | 11
这是我的查询:
percentile_cont
此查询返回:
SELECT id,
ceil(avg(price)) as avg_price,
percentile_cont(0.5) within group (order by price) as median_price,
ceil( sum (price) / sum (units) ) AS avg_pp_unit,
ceil( percentile_cont(0.5) within group (order by price) /
percentile_cont(0.5) within group (order by units) ) as median_pp_unit
FROM t
GROUP by id
我很确定平均值计算正确。这是计算每单位中位数价格的正确方法吗?
这篇文章表明这是正确的(尽管性能很差),但我很好奇中位数计算中的除法是否会歪曲结果。
答案 0 :(得分:1)
中位数是将数据样本的上半部分与下半部分(总体或概率分布)分开的值。对于数据集,可以将其视为“中间”值。 https://en.wikipedia.org/wiki/Median
所以您的中位数价格为55,中位数为9
{
"$connections": {
"value": {
"azureblob": {
"connectionId": "/subscriptions/XXX/resourceGroups/Default-SQL-CentralUS/providers/Microsoft.Web/connections/azureblob",
"connectionName": "azureblob",
"id": "/subscriptions/XXX/providers/Microsoft.Web/locations/centralus/managedApis/azureblob"
}
}
},
"definition": {
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
"actions": {
"Create_blob": {
"inputs": {
"body": "@variables('fileAccessURL')",
"headers": {
"Content-Type": "application/octet-stream"
},
"host": {
"connection": {
"name": "@parameters('$connections')['azureblob']['connectionId']"
}
},
"method": "post",
"path": "/datasets/default/files",
"queries": {
"folderPath": "/validis",
"name": "logicapptest",
"queryParametersSingleEncoded": true
}
},
"runAfter": {
"DownloadZIP": [
"Succeeded"
]
},
"runtimeConfiguration": {
"contentTransfer": {
"transferMode": "Chunked"
}
},
"type": "ApiConnection"
},
"DownloadZIP": {
"inputs": {
"method": "GET",
"uri": "@variables('fileAccessURL')"
},
"operationOptions": "DisableAutomaticDecompression",
"runAfter": {
"SetFileAccessURL": [
"Succeeded"
]
},
"type": "Http"
},
"InitializeAccessToken": {
"inputs": {
"variables": [
{
"name": "access_token",
"type": "String"
}
]
},
"runAfter": {},
"type": "InitializeVariable"
},
"InitializeFileAccessURL": {
"inputs": {
"variables": [
{
"name": "fileAccessURL",
"type": "String"
}
]
},
"runAfter": {
"InitializeAccessToken": [
"Succeeded"
]
},
"type": "InitializeVariable"
},
"POST-AuthKey": {
"inputs": {
"body": "grant_type=vapi_key&key=XXX",
"headers": {
"Content-Type": "application/x-www-form-urlencoded",
"Ocp-Apim-Subscription-Key": "XXX",
"cache-control": "no-cache"
},
"method": "POST",
"uri": "https://api.sandbox.XXX.com/v1/oauth/token"
},
"runAfter": {
"InitializeFileAccessURL": [
"Succeeded"
]
},
"type": "Http"
},
"RetrieveZIP_URL": {
"inputs": {
"headers": {
"Authorization": "@{concat('Bearer ',variables('access_token'))}",
"Ocp-Apim-Subscription-Key": "XXX",
"cache-control": "no-cache"
},
"method": "GET",
"uri": "https://api.sandbox.XXX.com/v1/extracts/general-ledger/engagements/XXX"
},
"runAfter": {
"SetAccessToken": [
"Succeeded"
]
},
"type": "Http"
},
"SetAccessToken": {
"inputs": {
"name": "access_token",
"value": "@{body('POST-AuthKey').access_token}"
},
"runAfter": {
"POST-AuthKey": [
"Succeeded"
]
},
"type": "SetVariable"
},
"SetFileAccessURL": {
"inputs": {
"name": "fileAccessURL",
"value": "@{body('RetrieveZIP_URL').fileaccessurl}"
},
"runAfter": {
"RetrieveZIP_URL": [
"Succeeded"
]
},
"type": "SetVariable"
}
},
"contentVersion": "1.0.0.0",
"outputs": {},
"parameters": {
"$connections": {
"defaultValue": {},
"type": "Object"
}
},
"triggers": {
"Recurrence": {
"recurrence": {
"frequency": "Month",
"interval": 12
},
"type": "Recurrence"
}
}
}
我不确定您打算将“单价中位数”设置为什么:
Sort by price Sort by units
id | price | units | | id | price | units
-------|-----------|--------| |-------|---------|----------
1 | 30 | 7 | | 1 | 30 | 7
1 | 40 | 8 | | 1 | 40 | 8
1 | 50 | 8 | | 1 | 50 | 8
>>> 2 | 50 | 11 | | 2 | 60 | 8 <<<<
>>> 2 | 60 | 8 | | 1 | 90 | 10 <<<<
1 | 90 | 10 | | 2 | 50 | 11
1 | 100 | 15 | | 1 | 100 | 15
2 | 110 | 22 | | 2 | 110 | 22
| | | | | |
(50+60)/2 (8+10)/2
55 9
如果“价格”列表示“单价”,则无需将55除以9,但如果“价格”是“订单总数”,则需要除以单位:55/9 = 6.11