我已经创建了一个到MS SQL Server的ODBC连接,它可以正常使用普通数据。
但是,当数据包含“HKSCS”字符时,它将变为?
这是表格结构(简化):
╔════════════════════════╦══════════╗
║ Column_name ║ Type ║
╠════════════════════════╬══════════╣
║ TraditionalChineseName ║ nvarchar ║
║ EnglishName ║ nvarchar ║
╚════════════════════════╩══════════╝
ODBC设置:
Odbc32.dll: 6.1.7601.23403
Driver: SQL Server Native Client 11.0
Option:
Use ANSI quoted identifiers
Use ANSI nulls, paddings and warnings
Perform translation for character data
示例数据:
╔════════════════════════╦═════════════╗
║ TraditionalChineseName ║ EnglishName ║
╠════════════════════════╬═════════════╣
║ 邨 ║ estate ║
║ 衞生 ║ health ║
╚════════════════════════╩═════════════╝
SQL Server中的排序规则:SQL_Latin1_General_CP1_CI_AS
结果在SSMS和.NET程序中都能正常工作(通过SQL Server驱动程序连接),但不能与ODBC连接一起使用。
目标:
我想将数据传递给 R 并绘制它
但是,当数据存储在data.frame
中时,这些HKSCS
字符将变为?
此外,如果我绘制它,所有非英语字符都无法正常显示。
问题:
我尝试获取结果并将其粘贴到R studio
并将其格式化为data.frame
,我发现它可以正常显示,但它会以<U+xxxx>
格式存储字符。
我只是想知道是否可以将这些字符更改为<U+
,例如(邨
到<U+90A8>
)?
答案 0 :(得分:0)
经过多次试用,我将SQL中的字符串转换为&#34; unicode number &#34;,然后在 R
中解析它简而言之:
&#34;村&#34; - &GT; &#34; 37032&#34; - &GT; &#34; \ u90A8&#34; - &GT; &#34;&LT; U + 90A8&GT;&#34;
首先,在SQL中将中文字符转换为基于十六进制的unicode编号:
;with targetTable as (
-- sim the table from database
select 1 as ID, N'邨' as TraditionalChineseName, 'estate' as EnglishName union
select 2 as ID, N'衞生' as TraditionalChineseName, 'health' as EnglishName
), ctx as (
select top (8000) n = row_number() over (order by Number)
FROM master.dbo.spt_values order by Number
), tc as (
-- convert character to unicode (dec) then binary (hex, 0x12345678)
-- get the last 4 digit
select f.ID, '\u' + right(convert(varchar(10), convert(varbinary(4), unicode(substring(f.TraditionalChineseName, x.n, 1))), 2), 4) as unicodeStr,
substring(f.TraditionalChineseName, x.n, 1) as charStr, x.n
from ctx x
inner join targetTable f with (nolock)
on x.n <= len(f.TraditionalChineseName)
)
select distinct s.ID, s.EnglishName,
(
select u1.unicodeStr as [text()]
from tc u1
where u1.ID = s.ID
order by u1.n
for xml path('')
) TraditionalChineseName
from targetTable s (nolock)
它将返回dataset
这样的
+----+-------------+------------------------+
| ID | EnglishName | TraditionalChineseName |
+----+-------------+------------------------+
| 1 | estate | \u90A8 |
| 2 | health | \u885E\u751F |
+----+-------------+------------------------+
在R中,使用SqlQuery
检索结果集
library(RODBC)
myconn <- odbcConnect(dsn="ODBC", uid="...", pwd="...")
dat <- sqlQuery(channel = myconn, query = qry, stringsAsFactors = FALSE)
close(myconn)
创建一个函数来转换每个&#34;字符&#34;回到unicode
convertUnicode <- function(x) {
parse(text = paste0("'", x, "'"))[[1]]
}
组建新的data.frame
kvp <- data.frame(ID = dat$ID,
TraditionalChineseName = unlist(lapply(dat$TraditionalChineseName, convertUnicode)),
EnglishName = dat$EnglishName)
使用此方法,角色可以显示在任何图表或表格中,而无需修改R中的区域设置