我的火车数据如下:
您好我正在尝试用不同的英语和非英语字符预测一组名字的语言。
当我尝试使用R命令读取输入时:
data = read.table("C:\\Users\\Sneha\\Documents\\study materials\\Independent Study\\train.txt",stringsAsFactors=FALSE,fileEncoding = "UTF-8")
我收到以下错误:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 3 did not have 4 elements
In addition: Warning messages:
1: In read.table("C:\\Users\\Sneha\\Documents\\study materials\\Independent Study\\train.txt", :
invalid input found on input connection 'C:\Users\Sneha\Documents\study materials\Independent Study\train.txt'
2: In read.table("C:\\Users\\Sneha\\Documents\\study materials\\Independent Study\\train.txt", :
incomplete final line found by readTableHeader on 'C:\Users\Sneha\Documents\study materials\Independent Study\train.txt'
任何人都可以建议更好的R命令来读取此类输入。
答案 0 :(得分:0)
这对我有用。
rm(list=ls())
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
entities[4,4]
as.character(entities[22,4])
entities <- read_delim(filename,
+ "\t", escape_double = FALSE, trim_ws = TRUE)
当我查看编码数据时:
entities[4,4]
# A tibble: 1 × 1
IGNORE
<chr>
1 <U+0411><U+0438><U+043B><U+043B>+<U+0413><U+043E><U+0440><U+0442><U+043D><U+0438>