使用sql或regex对数据进行分组

时间:2017-02-08 23:49:50

标签: sql sql-server

我在数据库中有一个包含1000多行的表,我有类似的数据:

company_name  - revenue  
123 opel AA   - 100  
234 GForm BB  - 200  
245 opel DF   - 250  
235 Gform BC  - 350

我想总结名称部分为欧宝的公司的收入,以及名称部分为Gform的所有公司的收益,以便我看到:

opel - 350   
gform - 550   

我想总结一下,如果名称的一部分与其他行的名称部分匹配。我不知道我想要总结的所有名字。 当然,我可以手动完成它,但必须有一种方法可以获取名称的所有部分,并将它们与任何行中公司名称的一部分相匹配。

3 个答案:

答案 0 :(得分:1)

另一个选项是 ParseName()

Select Co = parsename(replace(company_name,' ','.'),2)
      ,Revenue = sum(revenue)
 From YourTable
 Group By parsename(replace(company_name,' ','.'),2)

返回

Co      Revenue
GForm   550
opel    350

答案 1 :(得分:0)

这是一种可行的方法..

    ;with mycte as (
select 
'123 opel AA   - 100'   as rawdata
union all
select
'234 GForm BB  - 200 '
union all
select
'245 opel DF   - 250'  
union all
select
'235 Gform BC  - 350'
)
,mycte2 as (
Select
 rawdata

,ltrim(reverse(left(reverse(rawdata),charindex('-',reverse(rawdata))-1))) as quantity

,left(substring(rawdata, 
    charindex(' ', rawdata) + 1, len(rawdata)), 
    charindex(' ', substring(rawdata, charindex(' ', rawdata) + 2, len(rawdata)))) as model

 from mycte
 )

 Select model, sum( cast(quantity as int)) total
 from mycte2

 group by model

答案 2 :(得分:0)

这将执行模糊匹配,但存在潜在的缺陷,但您可以调整逻辑和/或过滤器

我应该补充一点,如果你不能使用UDF,可以很容易地将逻辑移植到CROSS APPLY

Declare @YourTable table (company_name varchar(25),revenue int)
Insert Into @YourTable values
('123 opel AA'   , 100  ),
('234 GForm BB'  , 200  ),
('245 opel DF'   , 250  ),
('235 Gform BC'  , 350  )


Select CoName   = RetVal
      ,Revenue  = sum(Revenue)
      ,Records  = count(Distinct Company_Name)
      ,Min_Co   = min(company_name)
      ,Max_Co   = max(company_name)
 From @YourTable A
 Cross Apply [dbo].[udf-Str-Parse](A.Company_Name,' ') B
 Where len(RetVal)>3                      -- significant word lenth
   and try_convert(float,RetVal) is null  -- exlude numeric values
 Group By RetVal

返回

CoName  Revenue Records   Min_Co        Max_Co
GForm   550     2         234 GForm BB  235 Gform BC
opel    350     2         123 opel AA   245 opel DF

如果需要,解析UDF

CREATE FUNCTION [dbo].[udf-Str-Parse] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    Select RetSeq = Row_Number() over (Order By (Select null))
          ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
    From  (Select x = Cast('<x>' + replace((Select replace(@String,@Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.')) as A 
    Cross Apply x.nodes('x') AS B(i)
);
--Thanks Shnugo for making this XML safe
--Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
--Select * from [dbo].[udf-Str-Parse]('this,is,<test>,for,< & >',',')