我正在尝试阅读使用::
作为列分隔符的文件:
userID::MovieID::Rating::Timestamp
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
这是我的代码
tr = read.table("/home/user/ml-1m/ratings.dat",sep = ":" )
print(tr)
结果是:
V1 V2 V3 V4 V5 V6 V7
1 2 NA 318 NA 5 NA 978298413
2 2 NA 1207 NA 4 NA 978298478
3 2 NA 1968 NA 2 NA 978298881
4 2 NA 3678 NA 3 NA 978299250
5 2 NA 1244 NA 3 NA 978299143
6 2 NA 356 NA 5 NA 978299686
7 2 NA 1245 NA 2 NA 978299200
我不想要NA
值
但如果我设置sep="::"
,则会出现错误invalid 'sep' value: must be one byte
我该如何解决这个问题?
答案 0 :(得分:8)
文本文件导入功能仅支持单个字符作为列分隔符。但是,您可以告诉read.table
忽略要导入的列及其colClasses
参数(请参阅帮助文件):
read.table(text = "userID::MovieID::Rating::Timestamp
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275",
sep = ":", colClasses = c(NA, "NULL"),
header = TRUE)
# userID MovieID Rating Timestamp
#1 1 1193 5 978300760
#2 1 661 3 978302109
#3 1 914 3 978301968
#4 1 3408 4 978300275