Question

这个有点棘手，我希望我能在这个问题上说清楚，因为这不是很常见的问题（或者可能是？）。

我有一张表有这样的重复记录（我说的是几百个）：

schema = {
    'a' : str,
    'b' : {
        'c': int,
        'd': int,
    }
}

data = {
   'a' : 1,
   'c' : 1
}

def flatten(obj, path = tuple()):
    if isinstance(obj, dict):
        for k, v in obj.items():
            yield from flatten(v, path + (k,))
    else:
        yield (path, obj)

fschema = dict(flatten(schema))
fdata = dict(flatten(data))

for path, exp in fschema.items():
    if path in fdata:
        got = type(fdata[path])
        if got is not exp:
            print(f'Incorrect type: path={path} got={got} exp={exp}')
    else:
        print(f'Missing key: path={path}')

我需要得到这样的答案：

|   Code|Route|State|City|Start| End|Style|
|-----------------------------------------|
|    R14|   14|   NL| MTY|  Ind|Main| High|
| R14-01|   14|   NL| MTY|  Ind|Main| High|
|  R15-1|   15|   NL| MTY|  Cal| Cle|  Low|
|   R15B|   15|   NL| MTY|  Cal| Cle|  Low|
|  R14-2|   14|   NL| MTY|  Ind|Main| High|
| RT15th|   15|   NL| MTY|  Cal| Cle| High|
|  RT15°|   15|   NL| MTY|  Cal| Cle| High|
|  R15.3|   15|   NL| MTY|  Cal| Cle|  Low|
| RT15/H|   15|   NL| MTY|  Cal| Cle| High|

我已经创建了一个查询，按路线，状态，城市，开始，结束和样式对结果进行分组;这是非常容易的部分。

| Code|Route|State|City|Start| End|Style|
|---------------------------------------|
|  R14|   14|   NL| MTY|  Ind|Main| High|
|  R15|   15|   NL| MTY|  Cal| Cle|  Low|
| RT15|   15|   NL| MTY|  Cal| Cle| High|

如果您看到代码列是唯一导致问题的列。我需要通过类似代码对该列进行分组（交叉字符及其位置R14，R14-01，R14-2 =＆gt; R14和R15-1，R15-2 =＆gt; R15-和R15，R15-1 =＆gt; R15）

我知道如何获得这些交叉点吗？

要澄清列代码是一团糟，有很多字符用作限制器。表格不是那么简短，我说的是数千的记录，一些的记录存在这个问题。我将桌子扩大了一点，这样你就可以更好地了解我想要完成的事情。

Answer 1

你可以这样做：

select (case when code in ('R14', 'R14-01', 'R14-2') then 'R14'
             when code in ('R15-1', 'R15-2') then 'R15-'
             when code in ('R15', 'R15-1') then 'R15'
             else code
        end) as newcode, Route, State, City, Start, End, Style
from t
group by newcode, Route, State, City, Start, End, Style;

我注意到R15-1被分配到两个类别。

Answer 2

与戈登的答案相同的一般观点，细节略有不同。

select distinct case
when code like '%-' then code -- ends in hyphen
else substr(code, 1, 3) end thecode
, etc

我如何在一个中加入重复记录

2 个答案: