这是我第一次处理包含表情符号字符的文本,所以我的问题可能非常基本,但我还没找到解决方案。
我在Android上从WhatsApp导出了一个.txt文件并发送到我的电脑(Windows)。数据看起来像这样:
chat <- c("05.10.17, 22:55 - Person A: Hey, whats up? 😳😄","05.10.17, 22:55 - Person A: I heard about your problem 😅🙄😂","05.10.17, 22:56 - Person B: What? From whom?🙈","05.10.17, 22:57 - Person A: Your mom...","05.10.17, 22:59 - Person B: ðŸ˜ðŸ˜„","05.10.17, 22:59 - Person B: ðŸ˜ðŸ˜„")
chat
[1] "05.10.17, 22:55 - Person A: Hey, whats up? 😳😄"
[2] "05.10.17, 22:55 - Person A: I heard about your problem 😅🙄😂"
[3] "05.10.17, 22:56 - Person B: What? From whom?🙈"
[4] "05.10.17, 22:57 - Person A: Your mom..."
[5] "05.10.17, 22:59 - Person B: ðŸ˜ðŸ˜„"
[6] "05.10.17, 22:59 - Person B: ðŸ˜ðŸ˜„"
我想要做的是用易于阅读的文字表示替换表情符号字符,所以我想要这样的东西:
[1] "05.10.17, 22:55 - Person A: Hey, whats up? [[SMILEY1]][[SMILEY2]]"
[2] "05.10.17, 22:55 - Person A: I heard about your problem [[SMILEY2]][[SMILEY3]][[SMILEY2]]"
[3] "05.10.17, 22:56 - Person B: What? From whom?[[SMILEY3]]"
[4] "05.10.17, 22:57 - Person A: Your mom..."
[5] "05.10.17, 22:59 - Person B: [[SMILEY2]][[SMILEY2]]"
[6] "05.10.17, 22:59 - Person B: [[SMILEY2]][[SMILEY2]]"
我在网上找到了一本字典,将Emojis映射到文字描述here,所以我把它导入了R:
# Import the List of all WhatsApp Emojis including their description
Emojis <- read.csv(url("https://raw.githubusercontent.com/iorch/jakaton_feminicidios/master/data/emojis.csv"),header = T, encoding = "UTF-8", stringsAsFactors = FALSE)
Emojis
# Pimp Description for better visibility later on
Emojis[,2] <- paste("[[Emoji:",Emojis[,2], "]]")
Emojis
现在,我尝试将Emoji[,1]
中chat
的所有匹配替换为Emoji [,2]
:
require(qdapRegex)
CleanMessage <- chat
for(i in seq_along(Emojis[,1])){
CleanMessage <- lapply(CleanMessage,rm_default, clean = TRUE, pattern = Emojis[i,1], replacement = Emojis[i,2])
}
CleanMessage
然而,我的输出看起来完全一样......有人能指出我的错误吗?