使用str_replace_all()无法在data.table列上使用Unicode撇号(“â<U \\ + 0080> <U \\ + 0099>”)

时间:2019-12-24 11:16:25

标签: r facebook data.table stringr

我正在尝试使用R分析一些下载的Facebook消息。某些消息中的撇号用“â”替换-我正尝试使用str_replace_all()替换它。

以下面的数据为例。

names <- c("Me", "Me", "You", "You", "Me", "You")
content <- c("Iâ<U+0080><U+0099>ve got my party on the 5th", "Hello", "Bears", "Four times four", "what do you want to eat?", "get some music")
date <- c("1/1/2001", "2/1/2001", "3/1/2001", "4/1/2001", "5/1/2001", "6/1/2001")
fbmessagesexample <- data.table(names, date, content)

然后我尝试使用str_replace_all

fbmessagesexample[, content := str_replace_all(content, pattern = fixed("â<U\\+0080><U\\+0099>"), replacement=fixed("'"))]

内容的第一行未替换。我在做错什么吗?

1 个答案:

答案 0 :(得分:1)

请传递pattern的向量。

以下代码段将导致控制台输出,如下所示。

library(data.table)
library(tidyverse)

names <- c("Me", "Me", "You", "You", "Me", "You")
content <- c("Iâ<U+0080><U+0099>ve got my party on the 5th", "Hello", "Bears", "Four times four", "what do you want to eat?", "get some music")
date <- c("1/1/2001", "2/1/2001", "3/1/2001", "4/1/2001", "5/1/2001", "6/1/2001")
fbmessagesexample <- data.table(names, date, content)

pattern <- c("â<U\\+0080><U\\+0099>")

fbmessagesexample[, content := str_replace_all(content, pattern, replacement=fixed("'"))]

控制台:

> fbmessagesexample
   names     date                      content
1:    Me 1/1/2001 I've got my party on the 5th
2:    Me 2/1/2001                        Hello
3:   You 3/1/2001                        Bears
4:   You 4/1/2001              Four times four
5:    Me 5/1/2001     what do you want to eat?
6:   You 6/1/2001               get some music