R输入中的非英文字符

时间:2017-03-23 02:11:23

标签: r utf-8 character-encoding read.table

我的火车数据如下:

enter image description here

您好我正在尝试用不同的英语和非英语字符预测一组名字的语言。

当我尝试使用R命令读取输入时:

 data = read.table("C:\\Users\\Sneha\\Documents\\study materials\\Independent Study\\train.txt",stringsAsFactors=FALSE,fileEncoding = "UTF-8")

我收到以下错误:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 3 did not have 4 elements
In addition: Warning messages:
1: In read.table("C:\\Users\\Sneha\\Documents\\study materials\\Independent Study\\train.txt",  :
  invalid input found on input connection 'C:\Users\Sneha\Documents\study materials\Independent Study\train.txt'
2: In read.table("C:\\Users\\Sneha\\Documents\\study materials\\Independent Study\\train.txt",  :
  incomplete final line found by readTableHeader on 'C:\Users\Sneha\Documents\study materials\Independent Study\train.txt'

任何人都可以建议更好的R命令来读取此类输入。

1 个答案:

答案 0 :(得分:0)

这对我有用。

rm(list=ls())
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
entities[4,4]
as.character(entities[22,4])
entities <- read_delim(filename, 
+     "\t", escape_double = FALSE, trim_ws = TRUE)

当我查看编码数据时:

entities[4,4]
# A tibble: 1 × 1
                                                                             IGNORE
                                                                              <chr>
1 <U+0411><U+0438><U+043B><U+043B>+<U+0413><U+043E><U+0440><U+0442><U+043D><U+0438>