如何根据特定数据的模式对批量表中的数据进行分组?

时间:2018-06-18 18:46:18

标签: sql group-by grouping aggregate

我有一张超过14000行的表格。从批量操作插入到表中的数据。

该表如下所示:

LineNumber Data
1          100000 BANK       RUNNER222----BANK FILE                  PAGE 1
2          OTHER RELATED DATA----------------------------------
3          OTHER RELATED DATA----------------------------------
4          OTHER RELATED DATA----------------------------------
5          100000 BANK       RUNNER222----BANK FILE                  PAGE 2
6          OTHER RELATED DATA----------------------------------
7          OTHER RELATED DATA----------------------------------
8          OTHER RELATED DATA----------------------------------
9          OTHER RELATED DATA----------------------------------
10         100000 BANK       RUNNER222----BANK FILE                  PAGE 3
11         OTHER RELATED DATA----------------------------------
12         OTHER RELATED DATA----------------------------------
13         OTHER RELATED DATA----------------------------------
14         OTHER RELATED DATA----------------------------------
15         TOTAL                                         100.00 
16         111111 BANK       RUNNER222----BANK FILE                  PAGE 1
17         OTHER RELATED DATA----------------------------------
18         OTHER RELATED DATA----------------------------------
19         OTHER RELATED DATA----------------------------------
20         OTHER RELATED DATA----------------------------------
21         111111 BANK       RUNNER222----BANK FILE                  PAGE 2
22         OTHER RELATED DATA----------------------------------
23         OTHER RELATED DATA----------------------------------
24         OTHER RELATED DATA----------------------------------
25         OTHER RELATED DATA----------------------------------
26         TOTAL                                         100.00 
27         222222 BANK       RUNNER222----BANK FILE                  PAGE 1
28         OTHER RELATED DATA----------------------------------
29         OTHER RELATED DATA----------------------------------
30         OTHER RELATED DATA----------------------------------
31         OTHER RELATED DATA---------------------------------- 
32         222222 BANK       RUNNER222----BANK FILE                  PAGE 2
33         OTHER RELATED DATA----------------------------------
34         OTHER RELATED DATA----------------------------------
35         OTHER RELATED DATA----------------------------------
36         OTHER RELATED DATA----------------------------------   
37         TOTAL                                         100.00  

and etc...

我想根据以上数据创建一个临时表,每个文件只包含3行。 该表将有3列,Start,End,BankID

类似的东西:

Start    End    Data
1        15     100000 BANK       RUNNER222----BANK FILE                  PAGE 1
16       26     111111 BANK       RUNNER222----BANK FILE                  PAGE 1
27       37     222222 BANK       RUNNER222----BANK FILE                  PAGE 1

我有以下逻辑来创建和填充表格:

if OBJECT_ID('tempdb..#MyTempTable') is not null drop table #MyTempTable
create table #MyTempTable(
    Start int,
    End int,
    [Data] nvarchar(max)
)

insert into #MyTempTable(Start, End, [Data])
select 
    min(LineNumber),
    max(LineNumber),
    [Data]  
from Bulk_Table where [Data] like '%RUNNER222----BANK FILE%'
group by [Data]

但是,当我从我的桌子中选择时,我有以下内容:

Start    End     Data
1        1       100000 BANK       RUNNER222----BANK FILE                  PAGE 1
15       15      100000 BANK       RUNNER222----BANK FILE                  PAGE 2
16       16      111111 BANK       RUNNER222----BANK FILE                  PAGE 3
26       26      111111 BANK       RUNNER222----BANK FILE                  PAGE 1
36       36      222222 BANK       RUNNER222----BANK FILE                  PAGE 2

基本上,我正在寻找的是临时表具有与银行数量相同的记录数量,在本例中为3,并且当批次开始,结束以及哪个银行是批次时,每行都有

我做错了什么? 我按数据分组,包含银行并获取行号的最小值和最大值

这里的正确方法是什么?

1 个答案:

答案 0 :(得分:1)

从评论部分转到答案。问题是[Data]列包含的页码为每个“BANK RUNNER”行创建了唯一的组。我的简单解决方案是简单地使用Group By截断Left,从而切断唯一的页码。

这也可以通过许多其他方法来完成,但这是更改原始查询的快速解决方案:

insert into #MyTempTable(Start, End, [Data])
select 
    min(LineNumber),
    max(LineNumber),
    [Data]  
from Bulk_Table 
where [Data] like '%RUNNER222----BANK FILE%'
group by [Data]

到此:

insert into #MyTempTable([Start], [End], [Data])
select 
    min(LineNumber),
    max(LineNumber),
    left([Data], 11) as FirstElevenCharactersOfData
from Bulk_Table 
where [Data] like '%RUNNER222----BANK FILE%'
group by left([Data], 11)

仅按前几个字符进行分组,可以将足够的唯一信息分组到单个BANK值,而不是PAGE编号。

RexTester示例:http://rextester.com/BHA35584