Unicode通过ODBC

时间:2017-02-06 07:51:19

标签: sql-server r unicode

我已经创建了一个到MS SQL Server的ODBC连接,它可以正常使用普通数据。

但是,当数据包含“HKSCS”字符时,它将变为?

这是表格结构(简化):

╔════════════════════════╦══════════╗
║      Column_name       ║   Type   ║
╠════════════════════════╬══════════╣
║ TraditionalChineseName ║ nvarchar ║
║ EnglishName            ║ nvarchar ║
╚════════════════════════╩══════════╝

ODBC设置:

Odbc32.dll: 6.1.7601.23403
Driver: SQL Server Native Client 11.0
Option:
    Use ANSI quoted identifiers
    Use ANSI nulls, paddings and warnings
    Perform translation for character data

示例数据:

╔════════════════════════╦═════════════╗
║ TraditionalChineseName ║ EnglishName ║
╠════════════════════════╬═════════════╣
║ 邨                     ║ estate      ║
║ 衞生                   ║ health      ║
╚════════════════════════╩═════════════╝

SQL Server中的排序规则:SQL_Latin1_General_CP1_CI_AS

结果在SSMS和.NET程序中都能正常工作(通过SQL Server驱动程序连接),但不能与ODBC连接一起使用。

目标:
我想将数据传递给 R 并绘制它 但是,当数据存储在data.frame中时,这些HKSCS字符将变为? 此外,如果我绘制它,所有非英语字符都无法正常显示。

问题:
我尝试获取结果并将其粘贴到R studio并将其格式化为data.frame,我发现它可以正常显示,但它会以<U+xxxx>格式存储字符。
我只是想知道是否可以将这些字符更改为<U+,例如(<U+90A8>)?

1 个答案:

答案 0 :(得分:0)

经过多次试用,我将SQL中的字符串转换为&#34; unicode number &#34;,然后在 R

中解析它

简而言之:

  

&#34;村&#34; - &GT; &#34; 37032&#34; - &GT; &#34; \ u90A8&#34; - &GT; &#34;&LT; U + 90A8&GT;&#34;

首先,在SQL中将中文字符转换为基于十六进制的unicode编号:

;with targetTable as (
    -- sim the table from database
    select 1 as ID, N'邨' as TraditionalChineseName, 'estate' as EnglishName union
    select 2 as ID, N'衞生' as TraditionalChineseName, 'health' as EnglishName
), ctx as (
    select top (8000) n = row_number() over (order by Number)
    FROM master.dbo.spt_values order by Number
), tc as (
    -- convert character to unicode (dec) then binary (hex, 0x12345678)
    -- get the last 4 digit
    select f.ID, '\u' + right(convert(varchar(10), convert(varbinary(4), unicode(substring(f.TraditionalChineseName, x.n, 1))), 2), 4) as unicodeStr, 
    substring(f.TraditionalChineseName, x.n, 1) as charStr, x.n
    from ctx x
    inner join targetTable f with (nolock)
        on x.n <= len(f.TraditionalChineseName)
)
select distinct s.ID, s.EnglishName,
            (
                select u1.unicodeStr as [text()]
                from tc u1
                where u1.ID = s.ID
                order by u1.n
                for xml path('')
            ) TraditionalChineseName
from targetTable s (nolock)

它将返回dataset这样的

+----+-------------+------------------------+
| ID | EnglishName | TraditionalChineseName |
+----+-------------+------------------------+
|  1 | estate      | \u90A8                 |
|  2 | health      | \u885E\u751F           |
+----+-------------+------------------------+

在R中,使用SqlQuery检索结果集

library(RODBC)
myconn <- odbcConnect(dsn="ODBC", uid="...", pwd="...")
dat <- sqlQuery(channel = myconn, query = qry, stringsAsFactors = FALSE)
close(myconn)

创建一个函数来转换每个&#34;字符&#34;回到unicode

convertUnicode <- function(x) {
  parse(text = paste0("'", x, "'"))[[1]]
}

组建新的data.frame

kvp <- data.frame(ID = dat$ID, 
                  TraditionalChineseName = unlist(lapply(dat$TraditionalChineseName, convertUnicode)), 
                  EnglishName = dat$EnglishName)

使用此方法,角色可以显示在任何图表或表格中,而无需修改R中的区域设置