我需要处理一些主要是csv的数据。问题是如果它出现在行的末尾(例如,在下面的示例中为3之后的那个),则R忽略逗号。
> strsplit("1,2,3,", ",")
[[1]]
[1] "1" "2" "3"
我希望将其作为[1] "1" "2" "3" NA
来阅读。我怎样才能做到这一点?感谢。
答案 0 :(得分:9)
以下是一些想法
scan(text="1,2,3,", sep=",", quiet=TRUE)
#[1] 1 2 3 NA
unlist(read.csv(text="1,2,3,", header=FALSE), use.names=FALSE)
#[1] 1 2 3 NA
这两个都返回整数向量。您可以将as.character
包裹在其中任何一个周围,以获得您在问题中显示的确切输出:
as.character(scan(text="1,2,3,", sep=",", quiet=TRUE))
#[1] "1" "2" "3" NA
或者,您可以在what="character"
中指定scan
,或在colClasses="character"
中指定read.csv
,以获得略有不同的输出
scan(text="1,2,3,", sep=",", quiet=TRUE, what="character")
#[1] "1" "2" "3" ""
unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character"), use.names=FALSE)
#[1] "1" "2" "3" ""
您还可以指定na.strings=""
以及colClasses="character"
unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character", na.strings=""),
use.names=FALSE)
#[1] "1" "2" "3" NA
答案 1 :(得分:7)
Hadley的stringi
(以及之前的stringr
)库是基本字符串函数的一个巨大改进(完全矢量化,一致的函数接口):
require(stringr)
str_split("1,2,3,", ",")
[1] "1" "2" "3" ""
as.integer(unlist(str_split("1,2,3,", ",")))
[1] 1 2 3 NA
答案 2 :(得分:3)
使用stringi
包:
require(stringi)
> stri_split_fixed("1,2,3,",",")
[[1]]
[1] "1" "2" "3" ""
## you can directly specify if you want to omit this empty elements
> stri_split_fixed("1,2,3,",",",omit_empty = TRUE)
[[1]]
[1] "1" "2" "3"