我从NOAA下载了一年的数据。问题是下载的数据中也包含文本。所以,我找到了数据线的模式并将其提取出来。我使用的代码如下:
url <- "http://tidesandcurrents.noaa.gov/data_menu.shtml?bdate=20080101&edate=20081231&wl_sensor_hist=W2&relative=&datum=6&unit=0&shift=s&stn=8737048+Mobile+State+Docks%2C+AL&type=Historic+Tide+Data&format=View+Data"
download.file(url,destfile="data/mobile-docks-2008.dat")
mob2008 <- readLines("data/mobile-docks-2008.dat")
head(mob2008)
# Find pattern to separate data
pat <- grep(pattern="([0-9]+)\\s[0-9]",mob2008)
jd1 <- data.frame(mob2008[pat])
head(jd1)
> head(jd1)
mob2008.pat.
1 8737048 20080101 00:00 0.125 0.270
2 8737048 20080101 01:00 0.090 0.220
3 8737048 20080101 02:00 0.070 0.167
4 8737048 20080101 03:00 0.061 0.093
5 8737048 20080101 04:00 0.057 0.002
6 8737048 20080101 05:00 0.052 -0.108
如何在R中将单行更改为多列?似乎是一个微不足道的问题,但我坚持这一点。
问题是数据帧jd1只有一列。我需要每行有5列。
答案 0 :(得分:3)
R> jd1 = readLines(textConnection("1 8737048 20080101 00:00 0.125 0.270
+ 2 8737048 20080101 01:00 0.090 0.220
+ 3 8737048 20080101 02:00 0.070 0.167
+ 4 8737048 20080101 03:00 0.061 0.093
+ 5 8737048 20080101 04:00 0.057 0.002
+ 6 8737048 20080101 05:00 0.052 -0.108"))
R> jd1 = data.frame(mob2008.pat. = jd1, stringsAsFactors = FALSE)
R> jd1
mob2008.pat.
1 1 8737048 20080101 00:00 0.125 0.270
2 2 8737048 20080101 01:00 0.090 0.220
3 3 8737048 20080101 02:00 0.070 0.167
4 4 8737048 20080101 03:00 0.061 0.093
5 5 8737048 20080101 04:00 0.057 0.002
6 6 8737048 20080101 05:00 0.052 -0.108
R> dim(jd1)
[1] 6 1
R> jd2 = strsplit(jd1[[1]], " ")
R> jd2 = lapply(jd2, function(x) x[x != ""] )
R> jd2 = do.call(rbind, jd2)
R> data.frame(jd2)
X1 X2 X3 X4 X5 X6
1 1 8737048 20080101 00:00 0.125 0.270
2 2 8737048 20080101 01:00 0.090 0.220
3 3 8737048 20080101 02:00 0.070 0.167
4 4 8737048 20080101 03:00 0.061 0.093
5 5 8737048 20080101 04:00 0.057 0.002
6 6 8737048 20080101 05:00 0.052 -0.108
这有6列,但删除行号列很容易,并且会为您提供所需的5列。
答案 1 :(得分:2)
我使用package reshape2和函数colsplit
解决方案如下:
library(reshape2)
> jd1<- colsplit(mob2008[pat],pattern="\\s+" ,names=c("V1","V2","V3","V4","V5"))
> head(jd1)
V1 V2 V3 V4 V5
1 8737048 20080101 00:00 0.125 0.270
2 8737048 20080101 01:00 0.090 0.220
3 8737048 20080101 02:00 0.070 0.167
4 8737048 20080101 03:00 0.061 0.093
5 8737048 20080101 04:00 0.057 0.002
6 8737048 20080101 05:00 0.052 -0.108
答案 2 :(得分:2)
无需reshape2或strsplit。
只需:
> jd1 <- read.table(text=mob2008[pat])
> head(jd1)
V1 V2 V3 V4 V5
1 8737048 20080101 00:00 0.125 0.270
2 8737048 20080101 01:00 0.090 0.220
3 8737048 20080101 02:00 0.070 0.167
4 8737048 20080101 03:00 0.061 0.093
5 8737048 20080101 04:00 0.057 0.002
6 8737048 20080101 05:00 0.052 -0.108