Question

我正在使用数据集main，其中包含各种字符的文本列，包括外国脚本，表情符号和正常聊天中所需的所有其他字符。从json转换此数据集时，使用它并没有造成任何麻烦，当我将其保存为.csv并重新加载时，出现以下错误。我不明白发生了什么，失去了什么性格。

a<-fromJSON(file='C:/Users/Thesatwik13/Documents/Personal Documents/New folder/ABC.json')
main<-NULL
for ( i in 1:length(a[[4]])){
  b<-a[[4]][i][[1]]
  main<-rbind(main,b)

}
main<-as.data.frame(main)
main$type[(nchar(main$text)==10&(!is.na(as.numeric(main$text)))&main$fromMe==T) ]<-"nm"
main$text[188]
"!! ©®"

write.csv(main, "dataset_after_cleaning.csv")
main<- read_csv("~/Personal Documents/New folder/dataset_after_cleaning.csv")
main$type[(nchar(main$text)==10&(!is.na(as.numeric(main$text)))&main$fromMe==T) ]
Error in nchar(main$text) : invalid multibyte string, element 188

这是原始字符串。通常，此类表情符号编码为"\xf0\U009f"等等，但在这种情况下不会。

可重复版本

Question <- read_excel("~/Question.xlsx", 
+     col_names = FALSE)
> View(Question)
> Question
# A tibble: 1 × 1
                     X0
                  <chr>
1 !! ©®<U+E405><U+E405>
> write.csv(Question, 'questionfile.csv')
> library(readr)
> questionfile <- read_csv("~/questionfile.csv")
Parsed with column specification:
cols(
  X1 = col_integer(),
  X0 = col_character()
)
Warning message:
Missing column names filled in: 'X1' [1] 
> View(questionfile)
> questionfile
# A tibble: 1 × 2
     X1                                  X0
  <int>                               <chr>
1     1 !! <U+00A9><U+00AE><U+E405><U+E405>
> nchar(questionfile)
Error in nchar(questionfile) : invalid multibyte string, element 2
> nchar(Question)
X0 
 7

这是Question.xlsx的截图，它是刚刚复制到那里的表情符号。

保存为CSV并再次加载R时文本字符丢失

0 个答案: