如何导入Unicode csv文件?

时间:2014-12-18 18:37:13

标签: r csv unicode csv-import

我有unicode csv文件:

LabelName,Label1,Label2,SpeciesLabel,Group,Subgroup,Species
التسمية 1,Group 1,Subgroup 1,Species 1,1,1,1
التسمية 2,Group 1,Subgroup 1,Species 1,1,1,1
التسمية 3,Group 1,Subgroup 1,Species 1,1,1,1

我想将它读入R,我使用了这个命令:

Data = read.csv("Data.csv", encoding="UTF-8", fileEncoding = "UTF-8")

但我收到了这个错误:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  empty beginning of file
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  invalid input found on input connection 'Data.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 1 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'Data.csv'

如何在R中读取unicode(带阿拉伯字母)csv文件。

谢谢!

1 个答案:

答案 0 :(得分:0)

您可以使用readLines和参数warn = FALSE来读取文件,然后使用read.csv参数执行text,如下所示:

arabic <- readLines("arabic.csv", warn = FALSE, encoding = "UTF-8")
Data = read.csv(text = arabic)
str(Data)

输出:

'data.frame':   3 obs. of  7 variables:
 $ X.U.FEFF.LabelName: Factor w/ 3 levels "التسمية 1","التسمية 2",..: 1 2 3
 $ Label1            : Factor w/ 1 level "Group 1": 1 1 1
 $ Label2            : Factor w/ 1 level "Subgroup 1": 1 1 1
 $ SpeciesLabel      : Factor w/ 1 level "Species 1": 1 1 1
 $ Group             : int  1 1 1
 $ Subgroup          : int  1 1 1
 $ Species           : int  1 1 1