我的数据看起来像这样:
data_
company result ID group
cars 50 q1 ground
boats 0 q1 water
bicycles 50 q2 ground
cars 75 q2 water
horses 0 q2 ground
foxes 50 q5 ground
.....etc
所以我想提出以下问题:
哪些Ground公司的结果与Cars公司不同,以及哪个季度(ID)发生这种情况?
实质上上面的结果是:
horses, q2 (result: 0, differs from cars 75)
bicycles, q2 (result: 50, differs from cars 75)
我使用Excel或Access来执行此操作。但如果有人有更好的建议,我会很高兴听到它。
我觉得我可以在Excel中管理半自动方法,获取基线数据,然后使用VLOOKUP和IF公式的组合提问。所以像这样:
baseline_
company result id
cars 50 q1
cars 75 q2
然后问:哪个Q1地面组的结果与50不同?哪个Q2地面组的结果与75不同?
即使像这样拆分它也是可能的:
groups_ground
company result id
cars etc. etc.
foxes etc. etc.
horses etc. etc.
bicycles etc. etc.
但是考虑到我的数据是500k +行,所有这些方法都有点单调乏味。
SQL我想的是:
SELECT * FROM data_ D
LEFT JOIN baseline_ B
ON D.result=!B.result;
答案 0 :(得分:1)
你的SQL是正确的。但是你需要寻找匹配然后选择不匹配的匹配,因此它需要更多的条件:
SELECT d.*
FROM data d LEFT JOIN
data dcars
ON d.result = dcars.result and
dcars.company = 'cars'
WHERE d.group = 'ground' and
dcars.company is null;
答案 1 :(得分:1)
data = [['cars', 50, 'q1', 'ground'],
['boat', 0, 'q1', 'water'],
['bicycles', 50, 'q2', 'ground'],
['cars', 75, 'q2', 'water'],
['horses', 0, 'q2', 'ground'],
['foxes', 50, 'q5', 'ground']]
data_dict = {i[2]: i[1] for i in data if i[0] == 'cars'}
for i in data:
if i[3] == 'ground' and i[0] != 'cars':
if i[2] != data_dict.get(i[2]):
print("{}, {} (result: {}, differs from cars {})".format(i[0], i[2], i[1], data_dict.get(i[2])))
结果:
bicycles, q2 (result: 50, differs from cars 75)
horses, q2 (result: 0, differs from cars 75)
foxes, q5 (result: 50, differs from cars None)