我会读取一个文本文件(使用read.table),其中包含三个字符,如" 000000"但我得到0而不是。我试着用:
X<-read.table(ouvrefic, header=TRUE, row.names=1, sep="",colClasses=c("integer","character","factor"))
我得到了:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer', got '"1"' (problem comes from row.names, I suppose)
我该怎么做?
非常感谢。
我的文字文件的开头:
"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"
答案 0 :(得分:1)
问题出在colClasses
参数:
首先,即使您将第一列用作row.names
,您也有4列。因此,您需要该向量中的四个元素。
如果您需要正确显示所有零,则需要将该列作为字符。
以下作品:
df <- read.table(header=T, text='"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"',
row.names=1,
colClasses=c('character', 'character',"character","factor"))
输出:
> df
dates Atscan2 pqrPQR
1 18369 0000000000000 1110
2 18369 0000000000000 1220,0
3 18369 0000000000000 2220
4 18369 0000000000000 1230,0,0
5 18369 0000000000000 1330,0
6 18369 0000000000000 2330,0
7 18369 0000000000000 3330
如上所示,问题是如果引用了列的元素(如日期列),那么在integer
中使用colClasses
选项将不起作用(因此我将其转换为字符以及)。之后您可以随时使用as.integer
并将其转换为整数。
Akrun在评论中提供了直接解决方案,这些评论将首先删除从readLines
读取的双引号,然后在列上应用colClasses
:
df <- read.table(text=gsub('[\\"]', '', readLines('ouvrefic.txt')),
row.names=1,
colClasses=c('character', 'integer', 'character', 'factor'))
答案 1 :(得分:1)
NA
colClasses
中使用row.names = 1
writeLines('"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"', "x.txt")
df <- read.table("x.txt", header = TRUE,
row.names = 1, colClasses = c(NA, NA, "character", NA))
sapply(df, class)
# dates Atscan2 pqrPQR
# "integer" "character" "factor"
df
# dates Atscan2 pqrPQR
# 1 18369 0000000000000 1110
# 2 18369 0000000000000 1220,0
# 3 18369 0000000000000 2220
# 4 18369 0000000000000 1230,0,0
# 5 18369 0000000000000 1330,0
# 6 18369 0000000000000 2330,0
# 7 18369 0000000000000 3330
此外,如果您使用的是基于Linux的,则可以使用system()
删除所有引号并使其更容易
read.table(
text = system("cat x.txt | tr -d \\\"", intern = TRUE),
colClasses = c(Atscan2 = "character")
)
# dates Atscan2 pqrPQR
# 1 18369 0000000000000 1110
# 2 18369 0000000000000 1220,0
# 3 18369 0000000000000 2220
# 4 18369 0000000000000 1230,0,0
# 5 18369 0000000000000 1330,0
# 6 18369 0000000000000 2330,0
# 7 18369 0000000000000 3330