我有一个包含大量表情符号的csv文件:
Person, Message,
A, ,
A, How are you?,
B, Alright!,
A,
我怎样read.csv()
进入R以便表情符号不会变成黑色?
(我希望跟踪表情符号的使用情况)
答案 0 :(得分:3)
我的控制台有一个接受这些“字符”的字体:
txt <- "Person, Message,
A, ,
A, How are you?,
B, Alright!,
A, "
Encoding(txt)
#[1] "UTF-8"
dput(txt)
#"Person, Message,\nA, \U0001f609,\nA, How are you?,\nB, \U0001f64d Alright!,\nA, \U0001f483\U0001f483"
> tvec <- scan(text=txt, what="")
Read 13 items
> dput(tvec)
c("Person,", "Message,", "A,", "\U0001f609,", "A,", "How", "are",
"you?,", "B,", "\U0001f64d", "Alright!,", "A,", "\U0001f483\U0001f483"
)
> which(tvec == '\U0001f609,')
[1] 4
当我使用扫描来使用逗号sep读取该文本时,前导空格阻止了相等测试成功,但如果我使用了两个字符版本则成功:
> which(tvec == '\U0001f609')
integer(0)
> dput(tvec)
c("Person", " Message", "", "A", " \U0001f609", "", "A", " How are you?",
"", "B", " \U0001f64d Alright!", "", "A", " \U0001f483\U0001f483"
)
> which(tvec == " ")
[1] 5
这是使用Courier New作为Mac上的控制台/编辑器字体。要查看Unicode表示的说明,请查看?Quotes
{base}。