我有unicode csv文件:
LabelName,Label1,Label2,SpeciesLabel,Group,Subgroup,Species
التسمية 1,Group 1,Subgroup 1,Species 1,1,1,1
التسمية 2,Group 1,Subgroup 1,Species 1,1,1,1
التسمية 3,Group 1,Subgroup 1,Species 1,1,1,1
我想将它读入R,我使用了这个命令:
Data = read.csv("Data.csv", encoding="UTF-8", fileEncoding = "UTF-8")
但我收到了这个错误:
Error in read.table(file = file, header = header, sep = sep, quote = quote, : empty beginning of file In addition: Warning messages: 1: In read.table(file = file, header = header, sep = sep, quote = quote, : invalid input found on input connection 'Data.csv' 2: In read.table(file = file, header = header, sep = sep, quote = quote, : line 1 appears to contain embedded nulls 3: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'Data.csv'
如何在R中读取unicode(带阿拉伯字母)csv文件。
谢谢!
答案 0 :(得分:0)
您可以使用readLines
和参数warn = FALSE
来读取文件,然后使用read.csv
参数执行text
,如下所示:
arabic <- readLines("arabic.csv", warn = FALSE, encoding = "UTF-8")
Data = read.csv(text = arabic)
str(Data)
输出:
'data.frame': 3 obs. of 7 variables:
$ X.U.FEFF.LabelName: Factor w/ 3 levels "التسمية 1","التسمية 2",..: 1 2 3
$ Label1 : Factor w/ 1 level "Group 1": 1 1 1
$ Label2 : Factor w/ 1 level "Subgroup 1": 1 1 1
$ SpeciesLabel : Factor w/ 1 level "Species 1": 1 1 1
$ Group : int 1 1 1
$ Subgroup : int 1 1 1
$ Species : int 1 1 1