我在MySQL中有这些数据(样本):
表1:
ID ITEM CNT
--------------------
0001 AAB 5
0001 BBA 3
0001 BBB 8
0001 AAC 10
0002 BBA 2
0002 BBC 7
0003 FFG 2
0003 JPO 4
0003 PUI 22
..........
我想找到一种以计数向量的形式在Python中导入这些数据的方法,例如:
0001 = [5,10,3,8,0,0,0,0]
0002 = [0,0,2,0,7,0,0,0]
0003 = [0,0,0,0,0,0,4,22]
其中元素表示此表单中每个ID的所有项目的计数:[AAB,AAC,BBA,BBB,BBC,FFG,JPO,PUI]
所以我想问一下,实现这个的最佳和最有效的方法是什么?从python或mysql做它更好吗?
谢谢
答案 0 :(得分:1)
通常更有效 - 在可能的情况下 - 在SQL中而不是在Python中操作数据。
使用此设置:
import config
import MySQLdb
conn = MySQLdb.connect(
host=config.HOST, user=config.USER,
passwd=config.PASS, db='test')
cursor = conn.cursor()
sql = '''\
DROP TABLE IF EXISTS foo
'''
cursor.execute(sql)
sql = '''\
CREATE TABLE foo (
ID varchar(4),
ITEM varchar(3),
CNT int)
'''
cursor.execute(sql)
sql = '''\
INSERT INTO foo VALUES (%s,%s,%s)
'''
cursor.executemany(sql, [['0001', 'AAB', 5],
['0001', 'BBA', 3],
['0001', 'BBB', 8],
['0002', 'BBA', 2]])
您可以使用以下方法构建所需的SQL:
items = 'AAB AAC BBA BBB BBC FFG JPO PUI'.split()
fields = ', '.join('COALESCE({}.CNT, 0)'.format(item) for item in items)
joins = '\n'.join('''\
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = '{i}') as {i}
ON T.ID = {i}.ID'''.format(i=item) for item in items)
sql = '''\
SELECT T.ID, {f}
FROM (SELECT DISTINCT ID from foo) as T
{j}
'''.format(f=fields, j=joins)
print(sql)
并像这样使用它:
result = dict()
cursor.execute(sql)
for row in cursor:
result[row[0]] = row[1:]
print(result)
使用的SQL查询是:
SELECT T.ID, COALESCE(AAB.CNT, 0), COALESCE(AAC.CNT, 0), COALESCE(BBA.CNT, 0), COALESCE(BBB.CNT, 0), COALESCE(BBC.CNT, 0), COALESCE(FFG.CNT, 0), COALESCE(JPO.CNT, 0), COALESCE(PUI.CNT, 0)
FROM (SELECT DISTINCT ID from foo) as T
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'AAB') as AAB
ON T.ID = AAB.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'AAC') as AAC
ON T.ID = AAC.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'BBA') as BBA
ON T.ID = BBA.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'BBB') as BBB
ON T.ID = BBB.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'BBC') as BBC
ON T.ID = BBC.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'FFG') as FFG
ON T.ID = FFG.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'JPO') as JPO
ON T.ID = JPO.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'PUI') as PUI
ON T.ID = PUI.ID
结果dict看起来像:
{'0001': (5L, 0L, 3L, 8L, 0L, 0L, 0L, 0L), '0002': (0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L)}
我知道你要求
0001 = [5,10,3,8,0,0,0,0]
0002 = [0,0,2,0,7,0,0,0]
0003 = [0,0,0,0,0,0,4,22]
但至少有两个问题。首先,0001
不是有效的Python变量名。变量名不能以数字开头。其次,您不希望动态定义变量名称,因为很难使用裸变量名称进行编程,直到运行时才知道该名称。
相反,使用be-be变量名作为dict中的键result
。然后,您可以使用0001
引用“变量”result['0001']
。
答案 1 :(得分:0)
您可以通过交叉表查询进行访问,其中行标题将是id,列标题将是项目,cnt是要聚合的值。然后,您可以循环遍历每一行的每一列以获取向量。有关交叉表查询的帮助,请参阅此处:http://allenbrowne.com/ser-67.html