我想要一种总结数据库表的方法,以便将共享公共ID的行汇总到一行输出中。
我的工具是SQLite和Python 2.x。
例如,鉴于以下我当地超市的水果价格表......
+--------------------+--------------------+--------------------+
|Fruit |Shop |Price |
+--------------------+--------------------+--------------------+
|Apple |Coles |$1.50 |
|Apple |Woolworths |$1.60 |
|Apple |IGA |$1.70 |
|Banana |Coles |$0.50 |
|Banana |Woolworths |$0.60 |
|Banana |IGA |$0.70 |
|Cherry |Coles |$5.00 |
|Date |Coles |$2.00 |
|Date |Woolworths |$2.10 |
|Elderberry |IGA |$10.00 |
+--------------------+--------------------+--------------------+
...我想制作一个汇总表,向我展示每个超市的每种水果的价格。空格应填入NULL。
+----------+----------+----------+----------+
|Fruit |Coles |Woolworths|IGA |
+----------+----------+----------+----------+
|Apple |$1.50 |$1.60 |$1.70 |
|Banana |$0.50 |$0.60 |$0.70 |
|Cherry |NULL |$5.00 |NULL |
|Date |$2.00 |$2.10 |NULL |
|Elderberry|NULL |NULL |$10.00 |
+----------+----------+----------+----------+
我相信文献称之为“数据透视表”或“数据透视查询”,但显然是SQLite doesn't support PIVOT
.(该问题中的解决方案使用了硬编码的LEFT JOIN
。这并不真正吸引人我,因为我事先不知道“专栏”的名字。)
现在我通过在Python中遍历整个表并累积dict
dicts
来实现这一点,这有点像klutzy。我愿意接受更好的解决方案,无论是在Python还是SQLite中,都会以表格形式提供数据。
答案 0 :(得分:13)
pandas包可以很好地处理这个问题。
>>> import pandas
>>> df=pandas.DataFrame(data, columns=['Fruit', 'Shop', 'Price'])
>>> df.pivot(index='Fruit', columns='Shop', values='Price')
Shop Coles IGA Woolworths
Fruit
Apple 1.5 1.7 1.6
Banana 0.5 0.7 0.6
Cherry 5.0 NaN NaN
Date 2.0 NaN 2.1
Elderberry NaN 10.0 NaN
文件: http://pandas.pydata.org/pandas-docs/stable/reshaping.html
一些学习熊猫的IPython笔记本: https://bitbucket.org/hrojas/learn-pandas
希望这会有所帮助。答案 1 :(得分:8)
在python方面,你可以使用一些itertools魔法重新排列你的数据:
data = [('Apple', 'Coles', 1.50),
('Apple', 'Woolworths', 1.60),
('Apple', 'IGA', 1.70),
('Banana', 'Coles', 0.50),
('Banana', 'Woolworths', 0.60),
('Banana', 'IGA', 0.70),
('Cherry', 'Coles', 5.00),
('Date', 'Coles', 2.00),
('Date', 'Woolworths', 2.10),
('Elderberry', 'IGA', 10.00)]
from itertools import groupby, islice
from operator import itemgetter
from collections import defaultdict
stores = sorted(set(row[1] for row in data))
# probably splitting this up in multiple lines would be more readable
pivot = ((fruit, defaultdict(lambda: None, (islice(d, 1, None) for d in data))) for fruit, data in groupby(sorted(data), itemgetter(0)))
print 'Fruit'.ljust(12), '\t'.join(stores)
for fruit, prices in pivot:
print fruit.ljust(12), '\t'.join(str(prices[s]) for s in stores)
<强>输出:强>
Fruit Coles IGA Woolw
Apple 1.5 1.7 1.6
Banana 0.5 0.7 0.6
Cherry 5.0 None None
Date 2.0 None 2.1
Elderberry None 10.0 None