我有一个如下数据库:
from skimage import data, io
from numpy import *
def metrics(first, second, x, y):
reshaped_second = roll(second, x,0)
reshaped_second = roll(reshaped_second, y, 1)
reshaped_first = first
mse = (((reshaped_first - reshaped_second) ** 2).sum())/(reshaped_first.size)
return (mse, ncc, x, y)
def align(path):
image = data.imread(path)
size = image.shape
part1 = image[0 : size[0]/3, : ]
part2 = image[size[0]/3 : 2*size[0]/3 , :]
part3 = image[2*size[0]/3 : size[0], :]
new_size = [min(part1.shape[0], part2.shape[0], part3.shape[0]), min(part1.shape[1], part2.shape[1], part3.shape[1])]
part1 = part1[new_size[0]/100*5 : new_size[0] - new_size[0]/100*5, new_size[1]/100*5 : new_size[1] - new_size[1]/100*5]
part2 = part2[new_size[0]/100*5 : new_size[0] - new_size[0]/100*5, new_size[1]/100*5 : new_size[1] - new_size[1]/100*5]
part3 = part3[new_size[0]/100*5 : new_size[0] - new_size[0]/100*5, new_size[1]/100*5 : new_size[1] - new_size[1]/100*5]
min_mse = 1000000000
xx_1 = None
yy_1 = None
for x in range(-15, 16):
for y in range(-15, 16):
mse = metrics(part1, part2,x,y)
if mse[0] <= min_mse:
xx_1 = mse[2]
yy_1 = mse[3]
min_mse = mse[0]
min_mse = 1000000000
xx_2 = None
yy_2 = None
for x in range(-15, 16):
for y in range(-15, 16):
mse = metrics(part1, part3,x,y)
if mse[0] <= min_mse:
xx_2 = mse[2]
yy_2 = mse[3]
min_mse = mse[0]
part2 = roll(part2, xx_1, 0) # numpy.roll()
part2 = roll(part2, yy_1, 1)
part3 = roll(part3, xx_2, 0)
part3 = roll(part3, yy_2, 1)
photo = dstack((part3,part2,part1))
io.imshow(photo)
io.show(
我试图获取与col_2 = x相匹配的所有行加上col_1的频率,按频率排序。例如,输出将是:
id | col_1 | col_2
------------------
1 | a | x
2 | a | x
3 | b | x
4 | b | z
5 | c | x
我尝试了各种查询,但由于我使用GROUP BY来获取频率,因此我无法获取各行(因为我想要每个ID)。例如:
id | col_1 | col_2 | freq
-------------------------
1 | a | x | 2
2 | a | x | 2
3 | b | x | 1
5 | c | x | 1
不幸的是,这并没有给我所有的行。它遗漏了id = 2.任何帮助将不胜感激!
谢谢!
答案 0 :(得分:1)
您的freq
列看起来像一个独立的,表格范围内的行数col_2 = 'x'
,按id
分组。你可以使用这个查询得到它:
这是SQL FIDDLE DEMO
SELECT
col_1,
COUNT(*) AS freq
FROM myTable
WHERE col_2 = 'x'
GROUP BY col_1
将其加入查询各个id
值,您应该得到您想要的结果:
SELECT
id,
col_1,
col_2,
col2Summary.freq
FROM myTable
INNER JOIN (
SELECT
col_1,
COUNT(*) AS freq
FROM myTable
WHERE col_2 = 'x'
GROUP BY col_1
) col2Summary ON myTable.col_1 = col2Summary.col_1
WHERE col_2 = 'x'
ORDER BY freq DESC
答案 1 :(得分:1)
这是使用标量子查询重写的@EdGibbs解决方案。 MySQL创建了一个不同的计划,您应该测试哪个更有效(fiddle):
SELECT
id,
col_1,
col_2,
(SELECT COUNT(*)
FROM myTable AS t2
WHERE t.col_1 = t2.col_1
AND col_2 = 'x') AS freq
FROM myTable AS t
WHERE col_2 = 'x'
ORDER BY freq DESC;
顺便说一句,几乎所有其他DBMS都支持Windowed Aggregate Functions,然后它就会很简单:
COUNT(*) OVER (PARTITION BY col_1) AS freq
答案 2 :(得分:0)
你也需要group by for col_2
同时删除*并仅包含GROUP BY列
SELECT col_1, col_2, COUNT(*) AS freq
FROM mytable
WHERE col_2 = x
GROUP BY col_1, col_2
ORDER BY freq DESC