从sql查询中查找印地语名称的姓氏(MS SQL)

时间:2014-04-25 12:16:22

标签: sql sql-server sql-server-2008

我有eng和devnagri名称的数据库,如


' PRABHU MATTHU RATHOD | प्रभुमथथूराठोड'

我打破这些名字作为名字,中间名,姓氏 英文名称正常工作,但印地文名称显示问题

我试过这个以查找名称中的最后一个空间索引

@MA_Name = प्रभु मथथू राठोड
REVERSE(SUBSTRING(REVERSE(@MA_Name), 1,CHARINDEX(' ', REVERSE(@MA_Name)) - 1));


这里失败CHARINDEX(' ', REVERSE(@MA_Name)) - 1)正在返回-1 我不知道为什么

3 个答案:

答案 0 :(得分:5)

尝试将case语句用于没有空格的特殊名称。类似的东西:

(CASE WHEN @MA_NAME LIKE N'% %'
      THEN REVERSE(SUBSTRING(REVERSE(@MA_Name), 1,CHARINDEX(N' ', REVERSE(@MA_Name)) - 1))
      ELSE @MA_NAME
 END)

这假设没有空格,名称就是姓氏。

编辑:

名称可能看起来像是有空格,但空格是' '以外的字符。您可以通过以下方式弄清楚它是什么:

select ascii(substring(@MA_NAME, 7, 1))

(或者是该空间的正确索引。)

一旦知道了角色是什么,就可以将查询结构化为:

(case when @MA_NAME like N'% %' then <what you have now>
      when @MA_NAME like N'%OTHERCHAR%' then <similar but with different space>
      else <whatever>
 end)

答案 1 :(得分:1)

这已经过编辑,结合了我们学到的知识,因此可以作为答案接受,对未来的访问者有用。

考虑以下代码:

declare @ma_name nvarchar(200) 
declare @r nvarchar(200)
declare @i int

select @MA_Name = N'प्रभु मथथू राठोड'  -- Thanks to Gordon Lindof for reminder to use N-prefix
set @r = reverse(@ma_name)

select @r

set @i = charindex(' ', @r )

select @i

结果是:

डोठार ूथथम ुभर्प

0

似乎正在发生的事情是反转功能正在反转代码点而不是字符。仅用4个代码点的子串来解释:

\u0925-->थ \u0942-->ू \u0020--> \u0930-->र 

u0942是一个组合字符。顺序u0925后跟u0942是单个字符。 REVERSE并不理解这一点并天真地颠倒了代码点。结果是:

u0930 u0020 u0942 u0925

现在组合字符附加到空间。所以现在它不是一个空间,它是一个无所不在的空间。 (对不起,对印地语一无所知,没有任何不尊重的意思。)

但CHARINDEX并不是那么天真。它看到你正在寻找一个空间,但它只找到修改过的空格字符。

海报通过使用FOR循环搜索空间来解决他的问题。


以下是解释情况的一些来源:

-- CharList generates a comma-separated list of decimal values representing the list of nchar's
-- in an nvarchar.  In this context it's not important how it works.

if object_id('CharList')is not null drop function CharList
go
create function dbo.CharList(@c nvarchar(max))returns varchar(max)
as
begin
  declare @x varbinary(max)
  declare @h varchar(max)
  declare @i int
  set @x = cast ( @c as varbinary(max))
  set @h = ''
  set @i = 1
  while @i <= len(@x) 
  begin
    if @i > 1 
      set @h = @h + ','
    set @h = @h + cast(       cast(substring(@x,@i,  1)as int)
                        + 256*cast(substring(@x,@i+1,1)as int) as varchar)

    set @i=@i+2
  end
  return @h
end
go

-- For this code sample I'm going to use latin characters. (Sorry can't read Hindi.)
-- This string contains lowercase 'e' with an acute accent. 
-- In Unicode this can be represented two different ways.
-- It can be represented as a single codepoint: decimal 233.
-- Or it can be built from the letter 'e', followed by
-- the combining character for the acute accent: decimal 769
-- The purpose of this source is to demonstrate combining characters, so I'll use the
-- two-codepoint version.    

declare @m nvarchar(max)
set @m = N'Re' + nchar(769) + N'al'
select @m, dbo.CharList(@m)                 -- Réal    82,101,769,97,108

-- You see, the word 'Réal' consists of 4 characters, but is represented by 5 codepoints.

select charindex ( N'e', @m )               -- 0
select charindex ( N'e'+nchar(769), @m )    -- 2
select charindex ( N'é', @m )               -- 2
select charindex ( N'a', @m )               -- 4

-- CharIndex is smart enough to understand that. It understands that there is no letter 'e'
-- in this string of characters, even though the codepoint 101 appears in the string. 
-- It does find the letter 'é' when expressed with the two-codepoint version.
-- It will even find it when expressed as the single-codepoint version, even though
-- the codepoint 233 appears nowhere in the string.
-- And finally, it has no problem finding the 'a', but note that it returns 4.
-- 'a' is the 3rd character of the string, but appears at the 4th codepoint in the list. 

set @m = reverse ( @m )
select @m, dbo.CharList(@m )                -- láeR    108,97,769,101,82

-- Reverse is not as clever as CharIndex. It doesn't care about combining characters.
-- It just reverses the list of codepoints. 
-- Now the acute accent combining character appears after the 'a', and so the string now
-- shows the 'a' with the acute accent, and the letter 'e' has lost its accent. 

select charindex ( N'e', @m )               -- 4
select charindex ( N'e'+nchar(769), @m )    -- 0
select charindex ( N'é', @m )               -- 0
select charindex ( N'a', @m )               -- 0

-- Now, CharIndex will find a letter 'e' where there was none before. 
-- It can't find 'é' in either the one-codepoint nor two-codepoint forms, 
-- because it's not there anymore. 
-- A search for 'a' fails, because the string doesn't contain a plain 'a' anymore. 

答案 2 :(得分:-1)

这是一种在SQL中并不总是容易的问题(解析和格式化)。根据我的经验,返回原始数据并让调用/客户端程序执行所有字符串操作通常会更好。它本质上通常是程序性的(因此不太适合设置操作)并且包含许多分支逻辑(在SQL中可能很尴尬)。