Question

在处理一些遗留数据时，我想将列上的数据分组，而忽略拼写错误。我认为SOUNDEX（）可以完成工作以达到预期的效果。这是我试过的：

SELECT soundex(AREA)
FROM MASTER
GROUP BY soundex(AREA)
ORDER BY soundex(AREA)

但是（显然）SOUNDEX在这样的结果行中返回了4个字符的代码，丢失了实际的字符串：

A131
A200
A236

如何在组中包含至少一个匹配项而不是4个字符的代码。

Answer 1

SELECT soundex(AREA) as snd_AREA, min(AREA) as AREA_EXAMPLE_1, max(AREA) as AREA_EXAMPLE_2
from MASTER
group by soundex(AREA)
order by AREA_EXAMPLE_1
;

在MySQL中你可以选择group_concat（distinct AREA）作为list_area来获取所有版本，我在SQL-Server中不知道这一点，但是min和max给出了两个区域的例子，你想要丢弃反正差异。

Answer 2

您还可以使用row_number()为每个soundex(area)值获取一行：

select AREA, snd
from
(
  select AREA, soundex(AREA) snd,
    row_number() over(partition by soundex(AREA)
                      order by soundex(AREA)) rn
  from master
) x
where rn = 1

请参阅SQL Fiddle with Demo

如何按列分组有拼写错误

2 个答案: