Question

我有eng和devnagri名称的数据库，如

＆＃39; PRABHU MATTHU RATHOD | प्रभुमथथूराठोड＆＃39;

我打破这些名字作为名字，中间名，姓氏英文名称正常工作，但印地文名称显示问题

我试过这个以查找名称中的最后一个空间索引

@MA_Name = प्रभु मथथू राठोड
REVERSE(SUBSTRING(REVERSE(@MA_Name), 1,CHARINDEX(' ', REVERSE(@MA_Name)) - 1));

这里失败CHARINDEX(' ', REVERSE(@MA_Name)) - 1)正在返回-1 我不知道为什么

Answer 1

尝试将case语句用于没有空格的特殊名称。类似的东西：

(CASE WHEN @MA_NAME LIKE N'% %'
      THEN REVERSE(SUBSTRING(REVERSE(@MA_Name), 1,CHARINDEX(N' ', REVERSE(@MA_Name)) - 1))
      ELSE @MA_NAME
 END)

这假设没有空格，名称就是姓氏。

编辑：

名称可能看起来像是有空格，但空格是' '以外的字符。您可以通过以下方式弄清楚它是什么：

select ascii(substring(@MA_NAME, 7, 1))

（或者是该空间的正确索引。）

一旦知道了角色是什么，就可以将查询结构化为：

(case when @MA_NAME like N'% %' then <what you have now>
      when @MA_NAME like N'%OTHERCHAR%' then <similar but with different space>
      else <whatever>
 end)

Answer 2

这已经过编辑，结合了我们学到的知识，因此可以作为答案接受，对未来的访问者有用。

考虑以下代码：

declare @ma_name nvarchar(200) 
declare @r nvarchar(200)
declare @i int

select @MA_Name = N'प्रभु मथथू राठोड'  -- Thanks to Gordon Lindof for reminder to use N-prefix
set @r = reverse(@ma_name)

select @r

set @i = charindex(' ', @r )

select @i

结果是：

डोठार ूथथम ुभर्प

和

似乎正在发生的事情是反转功能正在反转代码点而不是字符。仅用4个代码点的子串来解释：

\u0925-->थ \u0942-->ू \u0020--> \u0930-->र

u0942是一个组合字符。顺序u0925后跟u0942是单个字符。 REVERSE并不理解这一点并天真地颠倒了代码点。结果是：

u0930 u0020 u0942 u0925

现在组合字符附加到空间。所以现在它不是一个空间，它是一个无所不在的空间。（对不起，对印地语一无所知，没有任何不尊重的意思。）

但CHARINDEX并不是那么天真。它看到你正在寻找一个空间，但它只找到修改过的空格字符。

海报通过使用FOR循环搜索空间来解决他的问题。

以下是解释情况的一些来源：

-- CharList generates a comma-separated list of decimal values representing the list of nchar's
-- in an nvarchar.  In this context it's not important how it works.

if object_id('CharList')is not null drop function CharList
go
create function dbo.CharList(@c nvarchar(max))returns varchar(max)
as
begin
  declare @x varbinary(max)
  declare @h varchar(max)
  declare @i int
  set @x = cast ( @c as varbinary(max))
  set @h = ''
  set @i = 1
  while @i <= len(@x) 
  begin
    if @i > 1 
      set @h = @h + ','
    set @h = @h + cast(       cast(substring(@x,@i,  1)as int)
                        + 256*cast(substring(@x,@i+1,1)as int) as varchar)

    set @i=@i+2
  end
  return @h
end
go

-- For this code sample I'm going to use latin characters. (Sorry can't read Hindi.)
-- This string contains lowercase 'e' with an acute accent. 
-- In Unicode this can be represented two different ways.
-- It can be represented as a single codepoint: decimal 233.
-- Or it can be built from the letter 'e', followed by
-- the combining character for the acute accent: decimal 769
-- The purpose of this source is to demonstrate combining characters, so I'll use the
-- two-codepoint version.    

declare @m nvarchar(max)
set @m = N'Re' + nchar(769) + N'al'
select @m, dbo.CharList(@m)                 -- Réal    82,101,769,97,108

-- You see, the word 'Réal' consists of 4 characters, but is represented by 5 codepoints.

select charindex ( N'e', @m )               -- 0
select charindex ( N'e'+nchar(769), @m )    -- 2
select charindex ( N'é', @m )               -- 2
select charindex ( N'a', @m )               -- 4

-- CharIndex is smart enough to understand that. It understands that there is no letter 'e'
-- in this string of characters, even though the codepoint 101 appears in the string. 
-- It does find the letter 'é' when expressed with the two-codepoint version.
-- It will even find it when expressed as the single-codepoint version, even though
-- the codepoint 233 appears nowhere in the string.
-- And finally, it has no problem finding the 'a', but note that it returns 4.
-- 'a' is the 3rd character of the string, but appears at the 4th codepoint in the list. 

set @m = reverse ( @m )
select @m, dbo.CharList(@m )                -- láeR    108,97,769,101,82

-- Reverse is not as clever as CharIndex. It doesn't care about combining characters.
-- It just reverses the list of codepoints. 
-- Now the acute accent combining character appears after the 'a', and so the string now
-- shows the 'a' with the acute accent, and the letter 'e' has lost its accent. 

select charindex ( N'e', @m )               -- 4
select charindex ( N'e'+nchar(769), @m )    -- 0
select charindex ( N'é', @m )               -- 0
select charindex ( N'a', @m )               -- 0

-- Now, CharIndex will find a letter 'e' where there was none before. 
-- It can't find 'é' in either the one-codepoint nor two-codepoint forms, 
-- because it's not there anymore. 
-- A search for 'a' fails, because the string doesn't contain a plain 'a' anymore.

Answer 3

这是一种在SQL中并不总是容易的问题（解析和格式化）。根据我的经验，返回原始数据并让调用/客户端程序执行所有字符串操作通常会更好。它本质上通常是程序性的（因此不太适合设置操作）并且包含许多分支逻辑（在SQL中可能很尴尬）。

从sql查询中查找印地语名称的姓氏（MS SQL）

3 个答案: