我有两个独立的数据集describe
和df1
(每个都有不同的行数),我需要匹配,然后追加来自df2
的特定值的列。我有一个大型数据集,并且因为编写一个循环以匹配df2
到df2
的每一行,然后追加所需的值而陷入困境......
df1
中的数据根据日期与df1
匹配。每次迭代都会根据df2
与df1
匹配的记录数返回不同的行数。然后,我需要从df2
获取具有所需值的相关列,并将它们附加到日期匹配的输出中。这是一个手动的例子(它起作用并给我我需要的东西):
df2
我需要为40行df2重复此过程,然后将所有输出重新绑定,然后再次发生737次单独出现。这是我到目前为止所尝试的,其中没有一个有效:
# Step 1: Get the desired values from df2 that need to be appended to the matching output
df2_values = df2[, c("trip.start", "trip.end", "b","ln_b,"c","ln_c")]
# Step 2: Find the df1 records inclusive of the dates for the first record in df2
df1_rec = subset(df1, Date >= df2$trip.start[1] & Date <= df2$trip.end[1])
# Step 3: Append the date matching output and the df2 values
output1 = cbind(df1_rec, df2_values[1,])
# Doing this again for df2, second date
df1_rec = subset(df1, Date >= df2$trip.start[2] & Date <= df2$trip.end[2])
output2 = cbind(df1_rec, df2_values[2,])
这是我可行的数据:
# Attempt #1
library(foreach)
output = foreach (j=1:nrow(df2), .combine=rbind) %do% {
cbind(subset(df1, Date >= df2$trip.start[j] & Date <= df2$trip.end[j]), df2_values[j,]) }
Error in { :
task 7 failed - "arguments imply differing number of rows: 0, 1"
# Attempt #2
for (j in 1:nrow(df2)) {
df1_rec = subset(df1, Date >= df2$trip.start[j] & Date <= df2$trip.end[j])
output = cbind(df1_rec, df2_values[j,]) }
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 1
# Attempt #3
library(foreach)
output = foreach(i=1:nrow(df2), .combine=rbind) %do% {
for (j in 1:nrow(df2)) {
df1_rec = subset(df1, Date >= df2$trip.start[j] & Date <= df2$trip.end[j])
df2_rec = df2_values[j,]
cbind(df1_rec, df2_rec)}
}
Error in { :
task 1 failed - "arguments imply differing number of rows: 0, 1"
感谢您提供任何提示或帮助!
答案 0 :(得分:0)
您可以使用apply
执行此操作:
# Go through each row of df2 and do the matching.
# Return the `cbind`-ed df if matches found.
# The '1' in `apply(df2, 1, ...) means 'do-by-row'.
df12 <- apply(df2, 1, function(df2Row){
matches <- df1$Date >= df2Row["trip.start"] & df1$Date <= df2Row["trip.end"];
if(sum(matches) > 0){
# Had to transpose df2Row before doing data.frame(..).
# or it would make a single column df with 6 rows instead!
# Very weird...
cbind(df1[matches, ], data.frame(t(df2Row)));
} else {
NULL;
}
})
# Now bind all the dfs in the list output above into
# a single data.frame
df12 <- do.call(rbind, df12);
有更好的方法可以使用dplyr
library(非常值得学习!),但如果你想坚持基础R
,那么上面就是你可以做到的一种方式。