我的数据格式如下
Wed Nov 13 21:32:22 GMT 2013
Unique 1011266
back 471693 46.6438%
edge 82093 8.1178%
Thu Nov 14 13:17:02 GMT 2013
Unique 1030845
back 479623 46.5271%
edge 91870 8.9121%
Fri Nov 15 13:17:01 GMT 2013
Unique 1012254
back 455858 45.0339%
edge 69738 6.8893%
Sat Nov 16 13:17:01 GMT 2013
Unique 1030938
back 473239 45.9037%
edge 107645 10.4414%
Sun Nov 17 13:17:01 GMT 2013
Unique 1012122
back 486244 48.0420%
edge 131616 13.0039%
Mon Nov 18 13:17:01 GMT 2013
Unique 1090236
back 489005 44.8531%
edge 118735 10.8907%
Tue Nov 19 13:17:01 GMT 2013
Unique 1054120
back 477180 45.2680%
edge 89535 8.4938%
我正在考虑使用ggplot绘制这个作为时间序列,即绘图日期与边缘和日期对比。每行中后退和边缘的值是其值和百分比,但是无法将其转换为列格式,因此无法转换为数据框。对此的任何帮助都会很棒.....
想要的输出是:
Date unique back edge
2013-11-13 1011266 471693 82093
2013-11-14 1030845 479623 91870
答案 0 :(得分:2)
您想在此处使用read.fwf
:
dat <- read.fwf(file='file.txt',
width=list(28,c(6,-2,7),c(4,-4,6,-2,8),c(4,-4,5,-2,7)))
基本上,您只需指定widths
参数即可。当多行构成一个案例时,这是一个列表,其中每个元素对应于每行中字段的宽度。每条记录有四行,因此您有一个包含四个向量的列表。负数用于字段之间的空格。
结果如下:
> dat
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 Wed Nov 13 21:32:22 GMT 2013 Unique 1011266 back 471693 46.6438% edge 82093 8.1178%
2 Thu Nov 14 13:17:02 GMT 2013 Unique 1030845 back 479623 46.5271% edge 91870 8.9121%
3 Fri Nov 15 13:17:01 GMT 2013 Unique 1012254 back 455858 45.0339% edge 69738 6.8893%
4 Sat Nov 16 13:17:01 GMT 2013 Unique 1030938 back 473239 45.9037% edge 10764 10.4414
5 Sun Nov 17 13:17:01 GMT 2013 Unique 1012122 back 486244 48.0420% edge 13161 13.0039
6 Mon Nov 18 13:17:01 GMT 2013 Unique 1090236 back 489005 44.8531% edge 11873 10.8907
7 Tue Nov 19 13:17:01 GMT 2013 Unique 1054120 back 477180 45.2680% edge 89535 8.4938%
我想你以后可能想要转换它并指定名称:
setNames(dat[,c(1,3,5,6,8,9)],
c('Date','Unique','back','backpercent','edge','edgepercent'))
您最初也可以指定不同的widths
来跳过变量标签(唯一,边缘,背面等):
dat <- read.fwf(file='file.txt',
width=list(28,c(-8,7),c(-8,6,-2,8),c(-8,5,-2,9)),
col.names=c('Date','Unique','back','backpercent','edge','edgepercent'))
dat
Date Unique back backpercent edge edgepercent
1 Wed Nov 13 21:32:22 GMT 2013 1011266 471693 46.6438% 82093 8.1178%
2 Thu Nov 14 13:17:02 GMT 2013 1030845 479623 46.5271% 91870 8.9121%
3 Fri Nov 15 13:17:01 GMT 2013 1012254 455858 45.0339% 69738 6.8893%
4 Sat Nov 16 13:17:01 GMT 2013 1030938 473239 45.9037% 10764 10.4414%
5 Sun Nov 17 13:17:01 GMT 2013 1012122 486244 48.0420% 13161 13.0039%
6 Mon Nov 18 13:17:01 GMT 2013 1090236 489005 44.8531% 11873 10.8907%
7 Tue Nov 19 13:17:01 GMT 2013 1054120 477180 45.2680% 89535 8.4938%
然后,您可以轻松地将Date
列转换为POSIXct并随意执行任何操作:
as.POSIXct(as.character(dat$Date), format='%a %b %d %H:%M:%S GMT %Y', tz='GMT')
答案 1 :(得分:1)
我不知道你的数据是什么格式,但是我们说它是某种文本文件:
cat('Wed Nov 13 21:32:22 GMT 2013
Unique 1011266
back 471693 46.6438%
edge 82093 8.1178%
Thu Nov 14 13:17:02 GMT 2013
Unique 1030845
back 479623 46.5271%
edge 91870 8.9121%
Fri Nov 15 13:17:01 GMT 2013
Unique 1012254
back 455858 45.0339%
edge 69738 6.8893%
Sat Nov 16 13:17:01 GMT 2013
Unique 1030938
back 473239 45.9037%
edge 107645 10.4414%
Sun Nov 17 13:17:01 GMT 2013
Unique 1012122
back 486244 48.0420%
edge 131616 13.0039%
Mon Nov 18 13:17:01 GMT 2013
Unique 1090236
back 489005 44.8531%
edge 118735 10.8907%
Tue Nov 19 13:17:01 GMT 2013
Unique 1054120
back 477180 45.2680%
edge 89535 8.4938%\n', file='temp.txt')
raw <- readLines('temp.txt')
unique <- sapply(grep('Unique',raw,value=T),function(x) unlist(strsplit(x,' '))[3] )
back <- sapply(grep('back',raw,value=T),function(x) unlist(strsplit(x,' '))[5] )
edge <- sapply(grep('edge',raw,value=T),function(x) unlist(strsplit(x,' '))[5] )
dates <- as.POSIXct(sapply(grep('GMT',raw,value=T),function(x)
as.POSIXct(strptime(gsub('GMT','',x),'%a %b %d %H:%M:%S %Y'))),origin=origin)
# now make a data frame
dat <- data.frame(unique,back,edge,dates, row.names=NULL)
dat
# unique back edge dates
# 1 1011266 471693 82093 2013-11-13 21:32:22
# 2 1030845 479623 91870 2013-11-14 13:17:02
# 3 1012254 455858 69738 2013-11-15 13:17:01
# 4 1030938 473239 107645 2013-11-16 13:17:01
# 5 1012122 486244 131616 2013-11-17 13:17:01
# 6 1090236 489005 118735 2013-11-18 13:17:01
# 7 1054120 477180 89535 2013-11-19 13:17:01
# now plot
ggplot(dat,aes(x=dates,y=edge)) + geom_point() + scale_x_datetime() + theme_bw()
ggplot(dat,aes(x=dates,y=back)) + geom_point() + scale_x_datetime() + theme_bw()