我一直在尝试根据具有重复项的data.frame
进行条件求和。我想总结具有相同permno和日期的那些,并创建一个单独的列,其中的信息填写NA或者更好的0&#39。
我的数据集如下所示:
data.frame(crsp)
permno date PAYDT DISTCD divamt FACPR FACSHR PRC RET
1 10022 19280929 19281001 1272 0.25 0 0 71.00 0.045208
2 10022 19280929 19281001 1232 1.00 0 0 71.00 0.045208
3 10022 19281031 NA NA NA NA NA 73.50 0.035211
4 10022 19281130 NA NA NA NA NA 72.50 -0.013605
5 10022 19281231 19290202 1232 1.00 0 0 68.00 -0.044828
6 10022 19281231 19290202 1272 0.25 0 0 68.00 -0.044828
7 10022 19290131 NA NA NA NA NA 73.75 0.084559
8 10022 19290228 NA NA NA NA NA 69.00 -0.064407
9 10022 19290328 19290401 1232 1.00 0 0 65.00 -0.039855
10 10022 19290328 19290401 1272 0.25 0 0 65.00 -0.039855
11 10022 19290430 NA NA NA NA NA 67.00 0.030769
12 10022 19290531 NA NA NA NA NA 64.75 -0.033582
首先,我创建了permno + date来制作一个独特的代码
crsp$permnodate = paste(as.character(crsp$permno),as.character(crsp$date),sep="")
其次,我尝试将重复项加起来并将其转换为新的框架:
crsp_divsingl <- aggregate(crsp$divamt, by = list(permnodate = crsp$permnodate), FUN = sum, na.rm = TRUE)
但是,我无法将此信息正确地传回原始data.frame(crsp)
,因为列cbind
和cbind.fill
不允许我匹配的列具有不同的长度这是正确的。具体来说,我想要一个/第一个唯一permnodates的divamts的总和,所以它对应于剩余的data.frame
长度。我还没有与merge
或match
取得联系。
我还没有尝试过循环功能,或者成功创建了if
或ifelse
个功能。基本上,这可以使用VLOOKUP或index.match公式在excel中完成,但是,这在R中比我最初想的更棘手。
非常感谢帮助。
祝你好运
特勒尔斯
答案 0 :(得分:0)
您可以使用duplicated
和merge
来更轻松地实现这一目标。我写了一个例子。你必须为了你的目的改变它,但希望它会让你走上正确的轨道:
# Creating a fake sample dataset.
set.seed(9)
permno <- 10022:10071 # Allowing 50 possible permno's.
date <- 19280929:19280978 # Allow 50 possible dates.
value <- c(NA, 1:9) # Allowing NA or a 0 through 9 value.
# Creating fake data frame.
crsp <- data.frame(permno = sample(permno, 1000, TRUE), date = sample(date, 1000, TRUE), value = sample(value, 1000, TRUE))
# Loading a function that uses duplicated to get both the duplicated rows and the original rows.
fullDup <- function(x) {
bool <- duplicated(x) | duplicated(x, fromLast = TRUE)
return(bool)
}
# Getting the duplicated rows.
crsp.dup <- crsp[fullDup(crsp[, c("permno", "date")]), ] # fullDup returns a boolean of all the rows that were duplicated to another row by permno and date including the first row.
# Now aggregate.
crsp.dup[is.na(crsp.dup)] <- 0 # Converting NA values to 0.
crsp.dup <- aggregate(value ~ permno + date, crsp.dup, sum)
names(crsp.dup)[3] <- "value.dup" # Changing the name of the value column.
# Now merge back in with the original dataset.
crsp <- merge(crsp, crsp.dup, by = c("permno", "date"), all.x = TRUE)