我有一个文本数据,用“逗号”分隔,即“,”。下面给出了数据样本(第一行表示列名称):
userID,appName,startTime,endTime,endResult
chhieut,gms.mos.test,2012-07-01 02:47:16,2012-07-01 02:47:46,1
chhieut,gms.mos.test,2012-07-01 03:11:46,2012-07-01 03:12:25,2
chhieut,gms.mos.test,2012-07-01 03:13:36,2012-07-01 03:14:03,2
chhieut,gms.mos.test,2012-07-01 03:18:26,2012-07-01 03:18:58,2
chhieut,gms.mos.test,2012-07-01 04:10:36,2012-07-01 04:10:54,2
chhieut,gms.mos.test,2012-07-01 04:38:26,2012-07-01 04:38:48,2
chhieut,gms.mos.test,2012-07-01 04:48:56,2012-07-01 04:49:04,3
chhieut,gms.mos.test,2012-07-01 05:49:46,2012-07-01 05:50:14,2
chhieut,gms.mos.test,2012-07-01 06:19:07,2012-07-01 06:19:25,2
chhieut,gms.mos.test,2012-07-01 07:09:17,2012-07-01 07:09:47,2
我使用以下语法:
appsession <- read.table("C:/.../AppSession.txt", sep = ",",
col.names = c("userID","appName","startTime","endTime","endResult"),
fill = FALSE, strip.white = TRUE)
我收到此错误:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 5 elements
答案 0 :(得分:3)
如果您有一个空行并且计划在不使用skip = 2
的情况下使用'col.names',我认为您需要使用header=TRUE
。目前你的代码工作(无论如何都很好),只需一个简单的文本“
> txt <- "userID,appName,startTime,endTime,endResult
+ chhieut,gms.mos.test,2012-07-01 02:47:16,2012-07-01 02:47:46,1
+ chhieut,gms.mos.test,2012-07-01 03:11:46,2012-07-01 03:12:25,2
+ chhieut,gms.mos.test,2012-07-01 03:13:36,2012-07-01 03:14:03,2
+ chhieut,gms.mos.test,2012-07-01 03:18:26,2012-07-01 03:18:58,2
+ chhieut,gms.mos.test,2012-07-01 04:10:36,2012-07-01 04:10:54,2
+ chhieut,gms.mos.test,2012-07-01 04:38:26,2012-07-01 04:38:48,2
+ chhieut,gms.mos.test,2012-07-01 04:48:56,2012-07-01 04:49:04,3
+ chhieut,gms.mos.test,2012-07-01 05:49:46,2012-07-01 05:50:14,2
+ chhieut,gms.mos.test,2012-07-01 06:19:07,2012-07-01 06:19:25,2
+ chhieut,gms.mos.test,2012-07-01 07:09:17,2012-07-01 07:09:47,2
+ "
> appsession <- read.table(text=txt, sep = ",",
+ col.names = c("userID","appName","startTime","endTime","endResult"),
+ fill = FALSE, strip.white = TRUE)
>
> appsession
userID appName startTime endTime endResult
1 userID appName startTime endTime endResult
2 chhieut gms.mos.test 2012-07-01 02:47:16 2012-07-01 02:47:46 1
3 chhieut gms.mos.test 2012-07-01 03:11:46 2012-07-01 03:12:25 2
4 chhieut gms.mos.test 2012-07-01 03:13:36 2012-07-01 03:14:03 2
5 chhieut gms.mos.test 2012-07-01 03:18:26 2012-07-01 03:18:58 2
6 chhieut gms.mos.test 2012-07-01 04:10:36 2012-07-01 04:10:54 2
7 chhieut gms.mos.test 2012-07-01 04:38:26 2012-07-01 04:38:48 2
8 chhieut gms.mos.test 2012-07-01 04:48:56 2012-07-01 04:49:04 3
9 chhieut gms.mos.test 2012-07-01 05:49:46 2012-07-01 05:50:14 2
10 chhieut gms.mos.test 2012-07-01 06:19:07 2012-07-01 06:19:25 2
11 chhieut gms.mos.test 2012-07-01 07:09:17 2012-07-01 07:09:47 2
您应该使用标题或跳过标题行(以及跳过任何空白行。)查看空行数的一种方法是查看countfields( ..., sep=",")
的输出。另一种查看read.*
和scan
函数“看到”的方法是执行此代码(适当替换省略号):
appLines <- readLines("C:/.../AppSession.txt")
appLines[1:5] # will display the first 5 lines from that file
# with no attempt to deal with any separators.
答案 1 :(得分:2)
您需要提供实际数据集的链接,因为您提供的数据可以正常工作:
d = read.csv(textConnection("userID,appName,startTime,endTime,endResult
chhieut,gms.mos.test,2012-07-01 02:47:16,2012-07-01 02:47:46,1
chhieut,gms.mos.test,2012-07-01 03:11:46,2012-07-01 03:12:25,2
chhieut,gms.mos.test,2012-07-01 03:13:36,2012-07-01 03:14:03,2
chhieut,gms.mos.test,2012-07-01 03:18:26,2012-07-01 03:18:58,2
chhieut,gms.mos.test,2012-07-01 04:10:36,2012-07-01 04:10:54,2
chhieut,gms.mos.test,2012-07-01 04:38:26,2012-07-01 04:38:48,2
chhieut,gms.mos.test,2012-07-01 04:48:56,2012-07-01 04:49:04,3
chhieut,gms.mos.test,2012-07-01 05:49:46,2012-07-01 05:50:14,2
chhieut,gms.mos.test,2012-07-01 06:19:07,2012-07-01 06:19:25,2
chhieut,gms.mos.test,2012-07-01 07:09:17,2012-07-01 07:09:47,2"), header=TRUE)
快速检查:
R> head(d, 1)
userID appName startTime endTime endResult
1 chhieut gms.mos.test 2012-07-01 02:47:16 2012-07-01 02:47:46 1
R> dim(d)
[1] 10 5
确保您的实际文件中没有空白行 - 这确实会让事情变得充实。
答案 2 :(得分:2)
使用适当编辑的数据版本(即删除所有空行!),可以通过read.csv()
轻松加载到R中。请注意,我正在使用包含数据的文本连接,以避免将数据写入文件。只需将con
替换为read.csv()
中的文件名。
con <- textConnection("userID,appName,startTime,endTime,endResult
chhieut,gms.mos.test,2012-07-01 02:47:16,2012-07-01 02:47:46,1
chhieut,gms.mos.test,2012-07-01 03:11:46,2012-07-01 03:12:25,2
chhieut,gms.mos.test,2012-07-01 03:13:36,2012-07-01 03:14:03,2
chhieut,gms.mos.test,2012-07-01 03:18:26,2012-07-01 03:18:58,2
chhieut,gms.mos.test,2012-07-01 04:10:36,2012-07-01 04:10:54,2
chhieut,gms.mos.test,2012-07-01 04:38:26,2012-07-01 04:38:48,2
chhieut,gms.mos.test,2012-07-01 04:48:56,2012-07-01 04:49:04,3
chhieut,gms.mos.test,2012-07-01 05:49:46,2012-07-01 05:50:14,2
chhieut,gms.mos.test,2012-07-01 06:19:07,2012-07-01 06:19:25,2
chhieut,gms.mos.test,2012-07-01 07:09:17,2012-07-01 07:09:47,2
")
dat <- read.csv(con,
colClasses = c(rep("character", 2), rep("POSIXct", 2),
"numeric"))
close(con) ## closing connection, not needed with a file
另请注意,通过指定colclasses
参数,我们告诉R在读取数据之前数据是什么,稍后会保存一些格式,特别是使用DateTime数据。我们可以在这里执行此操作,因为您以正确的格式存储了DateTime变量。
R> head(dat)
userID appName startTime endTime endResult
1 chhieut gms.mos.test 2012-07-01 02:47:16 2012-07-01 02:47:46 1
2 chhieut gms.mos.test 2012-07-01 03:11:46 2012-07-01 03:12:25 2
3 chhieut gms.mos.test 2012-07-01 03:13:36 2012-07-01 03:14:03 2
4 chhieut gms.mos.test 2012-07-01 03:18:26 2012-07-01 03:18:58 2
5 chhieut gms.mos.test 2012-07-01 04:10:36 2012-07-01 04:10:54 2
6 chhieut gms.mos.test 2012-07-01 04:38:26 2012-07-01 04:38:48 2
R> str(dat)
'data.frame': 10 obs. of 5 variables:
$ userID : chr "chhieut" "chhieut" "chhieut" "chhieut" ...
$ appName : chr "gms.mos.test" "gms.mos.test" "gms.mos.test" "gms.mos.test" ...
$ startTime: POSIXct, format: "2012-07-01 02:47:16" "2012-07-01 03:11:46" ...
$ endTime : POSIXct, format: "2012-07-01 02:47:46" "2012-07-01 03:12:25" ...
$ endResult: num 1 2 2 2 2 2 3 2 2 2