对齐R中的分解数据

时间:2013-01-05 17:41:13

标签: r

我在excel中有一个excel数据集,我想加载到R中。数据集有两个变量“weight”和“height”,其中每个变量都有自己的日期,指定记录的时间。高度变量具有跳过/缺失值,同样在权重变量中,如果您在数据中下降得足够远。我正在尝试创建一个合并的数据集,其中权重和高度组合并按日期在适当的位置排列,并且当不存在值时放置NA。是否有任何命令/功能可以帮助我做到这一点?谢谢!

 obs     date   weight     date    height
  1   2010-10-04 52495  2010-10-04 11.6  
  2   2010-10-01 53000  2010-10-01 15.3
  3   2010-09-30 52916  2010-09-30 14.3
  4   2010-09-29 52785  2010-09-29 11.3
  5   2010-09-28 53348  2010-09-28 18.2
  6   2010-09-27 52885  2010-09-24 11.7
  7   2010-09-24 52174  2010-09-23 15.0
  8   2010-09-23 51461  2010-09-22 18.6
  9   2010-09-22 51286  2010-09-20 17.9
  10  2010-09-21 50968  
  11  2010-09-20 49250  

2 个答案:

答案 0 :(得分:2)

我假设这个问题不是关于将数据读入R,而是在读取之后对其进行处理。不过,您可以在阅读数据时使用参数check.names = FALSEfill = TRUE,以便您使用Reduce合并数据。

首先,模拟读取数据。

temp <- read.table(header = TRUE, 
text = "obs date weight date height
1   2010-10-04 52495  2010-10-04 11.6
2   2010-10-01 53000  2010-10-01 15.3
3   2010-09-30 52916  2010-09-30 14.3
4   2010-09-29 52785  2010-09-29 11.3
5   2010-09-28 53348  2010-09-28 18.2
6   2010-09-27 52885  2010-09-24 11.7
7   2010-09-24 52174  2010-09-23 15.0
8   2010-09-23 51461  2010-09-22 18.6
9   2010-09-22 51286  2010-09-20 17.9
10  2010-09-21 50968
11  2010-09-20 49250
", fill = TRUE, check.names = FALSE)

其次,使用Reduce()merge()

Reduce(function(x, y) merge(x, y, all.x = TRUE), 
       list(temp[2:3], temp[4:5]))
#          date weight height
# 1  2010-09-20  49250   17.9
# 2  2010-09-21  50968     NA
# 3  2010-09-22  51286   18.6
# 4  2010-09-23  51461   15.0
# 5  2010-09-24  52174   11.7
# 6  2010-09-27  52885     NA
# 7  2010-09-28  53348   18.2
# 8  2010-09-29  52785   11.3
# 9  2010-09-30  52916   14.3
# 10 2010-10-01  53000   15.3
# 11 2010-10-04  52495   11.6

答案 1 :(得分:1)

d <- read.table(header=FALSE, fill=TRUE, text="1   2010-10-04 52495  2010-10-04 11.6  
  2   2010-10-01 53000  2010-10-01 15.3
  3   2010-09-30 52916  2010-09-30 14.3
  4   2010-09-29 52785  2010-09-29 11.3
  5   2010-09-28 53348  2010-09-28 18.2
  6   2010-09-27 52885  2010-09-24 11.7
  7   2010-09-24 52174  2010-09-23 15.0
  8   2010-09-23 51461  2010-09-22 18.6
  9   2010-09-22 51286  2010-09-20 17.9
  10  2010-09-21 50968  
  11  2010-09-20 49250  ")

d1 <- d[2:3]
d2 <- d[!is.na(d[,5]),][4:5]

names(d1) <- c('Date', 'val1')
names(d2) <- c('Date', 'val2')
m <- merge(d1, d2, by='Date', all=TRUE)

> m

##          Date  val1 val2
## 1  2010-09-20 49250 17.9
## 2  2010-09-21 50968   NA
## 3  2010-09-22 51286 18.6
## 4  2010-09-23 51461 15.0
## 5  2010-09-24 52174 11.7
## 6  2010-09-27 52885   NA
## 7  2010-09-28 53348 18.2
## 8  2010-09-29 52785 11.3
## 9  2010-09-30 52916 14.3
## 10 2010-10-01 53000 15.3
## 11 2010-10-04 52495 11.6