Question

我有3个文件，包含3个变量：日期，ID和价格。我想按日期合并它们，所以如果我当前的文件是：

date      ID  Price
01/01/10   A   1
01/02/10   A   1.02
01/02/10   A   0.99
...
...

我想得到一个合并的文件，看起来像下面的ID A，B和C（Pr for Price）：

date       Pr.A   Pr.B  Pr.C     
01/01/10   1      NA    NA
01/02/10   1.02   1.2   NA
01/03/10   0.99   1.3   1
01/04/10   NA     1.23  2
01/05/10   NA     NA    3

请注意，对于某些日期，没有价格，因此在这种情况下是NA。

我目前的方法有效，但我觉得有点笨拙。

setwd('~where you put the files')
library(plyr)
listnames = list.files(pattern='.csv')
pp1 = ldply(listnames,read.csv,header=T) #put all the files in a data.frame

names(pp1)=c('date','ID','price')
pp1$date = as.Date(pp1$date,format='%m/%d/%Y')

# Reshape data frame so it gets organized by date
pp1=reshape(pp1,timevar='ID',idvar='date',direction='wide')

你能想到更好的方法吗？

Answer 1

看起来像Reduce()的作业：

# Read the files in to a single list, removing unwanted second column from each.
dataDir <- "example"
fNames <- dir(dataDir)
dataList <- lapply(file.path(dataDir, fNames),
                   function(X) {read.csv(X, header=TRUE)[-2]})

# Merge them                   
out <- Reduce(function(x,y) merge(x,y, by=1, all=TRUE), dataList)

# Construct column names
names(out)[-1] <- paste("Pr.", toupper(sub("1.csv", "", fNames)), sep="")
out
#       date Pr.A Pr.B Pr.C
# 1 1/1/2010 1.00   NA   NA
# 2 1/2/2010 1.02 1.20   NA
# 3 1/3/2010 0.99 1.30    1
# 4 1/4/2010   NA 1.23    2
# 5 1/5/2010   NA   NA    3

实际上，您的方法对我来说很合适，但我可以看到在调用Reduce时更喜欢语法的简单性和透明性。

Answer 2

我无权访问这些文件，我在公司防火墙后面。一旦你构建了data.frame，我就会使用cast方法。

    res = cast(pp1,date~ID,value="Price",mean)

将.csv文件与R合并

2 个答案: