从逗号分隔值

时间:2016-06-01 20:21:29

标签: sql sql-server tsql join

T-sql问题: 我需要帮助从2个表构建一个连接,其中一个表我已经聚合了数据(逗号分隔值)。

我有一个表 - 我有3列的用户:UserId,DefaultLanguage和OtherLanguages。

表格如下:

UserId  | DefaultLanguage  |  OtherLanguages
---------------------------------------------
   1    |      en          |       NULL
   2    |      en          |       it, fr
   3    |      fr          |       en, it
   4    |      en          |       sp

等等。

我有另一张表,其中我有语言代码(en,fr,ro,it,sp)和语言名称之间的关联:

 LangCode  | LanguageName
-------------------------
    en     | English
    fr     | French
    it     | Italian
    sp     | Spanish

等等。

我想创建一个这样的视图:

UserId  | DefaultLanguage  |  OtherLanguages
---------------------------------------------
   1    |    English       |    NULL
   2    |    English       |    Italian, French
   3    |    French        |    English, Italian
   4    |    English       |    Spanish

等等。

简而言之,我需要一个语言代码替换为语言名称的视图。

请帮忙吗?

3 个答案:

答案 0 :(得分:2)

当然,您可以重新创建所有表的几种解决方案来更改数据结构。 1.如果所有语言都是2位数:

select t1.UserId, t2.LanguageName, 
ISNULL( t3.LanguageName, '') + ISNULL(', '+t4.LanguageName, '') + ISNULL( ', '+t5.LanguageName, '') OtherLanguages
from Table1 t1 
inner join Table2 t2 on t1.DefaultLanguage = t2.LangCode
left join Table2 t3 on Left(t1.OtherLanguages,2) = t3.LangCode
left join Table2 t4 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 3 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 4, 2) ELSE null END = t4.LangCode
left join Table2 t5 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 6 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 7, 2) ELSE null END = t5.LangCode
  1. 使用用户定义功能:
  2. 创建功能[dbo]。[func_GetLanguageName](@ pLanguageList varchar(max))

    RETURNS varchar(max)AS

    BEGIN

    Declare @aLanguageList varchar(max) = @pLanguageList
    Declare @aLangCode varchar(max) = null
    Declare @aReturnName varchar(max) = null
    WHILE LEN(@aLanguageList) > 0
    BEGIN
        IF PATINDEX('%,%',@aLanguageList) > 0
        BEGIN
            SET @aLangCode = RTRIM(LTRIM(SUBSTRING(@aLanguageList, 0, PATINDEX('%,%',@aLanguageList))))
            SET @aLanguageList = LTRIM(SUBSTRING(@aLanguageList, LEN(@aLangCode + ',') + 1,LEN(@aLanguageList)))
        END
        ELSE
        BEGIN
            SET @aLangCode = @aLanguageList
            SET @aLanguageList = NULL
        END
        Select @aReturnName = ISNULL( @aReturnName + ', ' , '') + LanguageName from Table2 where LangCode=@aLangCode
    END
    RETURN(@aReturnName)
    

    END

    并使用select

    select UserId, dbo.func_GetLanguageName(DefaultLanguage)DefaultLanguage, dbo.func_GetLanguageName(OtherLanguages) OtherLanguages from table1
    

答案 1 :(得分:1)

  

最佳做法是要求不要将此类逗号分隔   列中的数据......

由于您在评论中声明无法更改架构,因此下一个最好的事情是function。这可以在内联选择查询中使用。

使用字符串操作,SQL出了名的慢。 Here是一篇关于该主题的有趣文章。有许多SQL"字符串拆分"功能在那里。它们通常都会以逗号分隔的字符串拆分并返回一个表。

  

对于此特定用例,您实际需要scalar-valued   function (一个返回一个值的函数)而不是a   table-valued函数(返回值表)

下面是一个修改过的函数,它返回一个标量值来代替原始逗号分隔的语言代码字符串。

评论解释了一行一行的情况。

要点是你必须遍历输入字符串,跟踪最后一个逗号位置,提取每个代码,从languages表中查找完整语言,然后将输出作为逗号分隔的字符串返回。

语言功能的语言代码:

Create Function [dbo].fn_languageCodeToFull
    ( @Input Varchar(100) )
    Returns Varchar(1000)
As
Begin
    -- To address null input, based on the example you provided, we set the output to NULL if there is no input
    If @Input = '' Or @Input Is Null 
        Return Null

    Declare 
        @CodeLength int, -- constant for code length to avoid hardcoded "magic numbers"
        @Output varchar(1000), -- will contain the final comma delimited string of full languages
        @LastIndex int, -- tracks the location of the input we are searching as we loop over the string
        @CurrentCode varchar(2), -- for code readability, we extract each language code to this variable
        @CurrentLanguage varchar(50), -- for code readability, we store the full language in this variable
        @IndexIncrement int -- constant to increment the search index by 1 at each iteration
                            -- ensuring the loop moves forward

    Set @LastIndex = 0  -- seed the index, so we begin to search at 0 index 
    Set @CodeLength = 2 -- ISO language codes are always 2 characters in length
    Set @Output = '' -- seed with empty string to avoid NULL when concatenating
    Set @IndexIncrement = 1 -- again avoiding hardcoded values...

    -- We will loop until we have gone to or beyond the length of the input string
    While @LastIndex < len(@Input)
        Begin
            -- Set the index of each comma (charindex is 1-based)
            Set @LastIndex = CHARINDEX(',', @Input, @LastIndex)
            -- When we get to the last item, CharIndex will return 0 when it does not find a comma. 
            -- To pull the last item, we will artificially set @LastIndex to be 1 greater than the input string
            -- This will allow the code following this line to be unaltered for this scenario
            If @LastIndex = 0 set @LastIndex = len(@Input) + 1 -- account for 1-based index of substring
            -- Extract the code prior to the current comma that charindex has identified
            Set @CurrentCode = substring(@Input, @LastIndex - @CodeLength, @CodeLength)
            -- Do a lookup to get the language for the current code
            Set @CurrentLanguage = (Select LanguageName From languages Where code = @CurrentCode)
            -- Only add comma after first language to ensure no extra comma will be present in Output
            If @LastIndex > 3 Set @Output = @Output + ','
            -- Here we build the Output string with the language
            Set @Output = @Output + @CurrentLanguage

            -- Finally, we increment @LastIndex by 1 to avoid loop on first instance of comma
            Set @LastIndex = @LastIndex + @IndexIncrement
        End
    Return @Output
End

然后你的观点就会像:

使用以下功能的示例视图:

Create View vw_UserLanguages
As
    Select 
        UserId,
        dbo.fn_languageCodeToFull(DefaultLanguage) as DefaultLanguage,                          
        dbo.fn_languageCodeToFull(OtherLanguages) as OtherLanguages,
    From UserLanguageCodes -- you do not provide a name so I made one up

请注意,无论是否有逗号,该函数都会起作用,因此无需在此处加入Languages表,因为在这种情况下您可以让函数完成所有工作。

答案 2 :(得分:1)

一个快速而肮脏的解决方案是使用嵌套的REPLACE命令,但这可能导致一个非常复杂的语句有点冗长,特别是如果你有超过五种语言。

举个例子:

SELECT [UserId],[DefaultLanguage],
CASE 
  WHEN [OtherLanguages] IS NULL THEN ''
  ELSE REPLACE(
    REPLACE(
    REPLACE(
    REPLACE(
    REPLACE([OtherLanguages],
    'en','English'),
    'fr','French'),
    'it','Italian'),
    'ro','Romulan'), --Probably not the intended language ;-)
    'sp','Spanish')
END as [OtherLanguages]  
FROM YourTable

就个人而言,我再次使用REPLACE命令创建标量函数,但您可以检查存在的语言数量并添加计数器,以便您不进行不必要的查找。

SELECT [UserId],[DefaultLanguage],
CASE 
  WHEN [OtherLanguages] IS NULL THEN ''
  WHEN [OtherLanguages] = '' THEN ''
  ELSE do_function_name([OtherLanguages])
END as [OtherLanguages]  
FROM YourTable

这可能不是一个好习惯,但有时候在单个字段中存储多个值会更有效,但是当你这样做时,它会减慢处理数据的速度。