Question

我有两个独立的数据集describe和df1（每个都有不同的行数），我需要匹配，然后追加来自df2的特定值的列。我有一个大型数据集，并且因为编写一个循环以匹配df2到df2的每一行，然后追加所需的值而陷入困境......

df1中的数据根据日期与df1匹配。每次迭代都会根据df2与df1匹配的记录数返回不同的行数。然后，我需要从df2获取具有所需值的相关列，并将它们附加到日期匹配的输出中。这是一个手动的例子（它起作用并给我我需要的东西）：

df2

我需要为40行df2重复此过程，然后将所有输出重新绑定，然后再次发生737次单独出现。这是我到目前为止所尝试的，其中没有一个有效：

# Step 1: Get the desired values from df2 that need to be appended to the matching output
  df2_values = df2[, c("trip.start", "trip.end", "b","ln_b,"c","ln_c")] 

# Step 2: Find the df1 records inclusive of the dates for the first record in df2
  df1_rec = subset(df1, Date >= df2$trip.start[1] & Date <= df2$trip.end[1])

# Step 3: Append the date matching output and the df2 values
  output1 = cbind(df1_rec, df2_values[1,])

# Doing this again for df2, second date
  df1_rec = subset(df1, Date >= df2$trip.start[2] & Date <= df2$trip.end[2]) 
  output2 = cbind(df1_rec, df2_values[2,])

这是我可行的数据：

    # Attempt #1
       library(foreach)
       output = foreach (j=1:nrow(df2), .combine=rbind) %do% {
       cbind(subset(df1, Date >= df2$trip.start[j] & Date <= df2$trip.end[j]), df2_values[j,]) }

    Error in { : 
  task 7 failed - "arguments imply differing number of rows: 0, 1"

# Attempt #2
       for (j in 1:nrow(df2)) {
       df1_rec = subset(df1, Date >= df2$trip.start[j] & Date <= df2$trip.end[j])
       output = cbind(df1_rec, df2_values[j,]) }

    Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 0, 1

# Attempt #3
   library(foreach)
   output = foreach(i=1:nrow(df2), .combine=rbind) %do% {
            for (j in 1:nrow(df2)) {
            df1_rec = subset(df1, Date >= df2$trip.start[j] & Date <= df2$trip.end[j])
            df2_rec = df2_values[j,]
             cbind(df1_rec, df2_rec)} 
             }
Error in { : 
  task 1 failed - "arguments imply differing number of rows: 0, 1"

感谢您提供任何提示或帮助！

Answer 1

您可以使用apply执行此操作：

# Go through each row of df2 and do the matching.
# Return the `cbind`-ed df if matches found.
# The '1' in `apply(df2, 1, ...) means 'do-by-row'.
df12 <- apply(df2, 1, function(df2Row){
    matches <- df1$Date >= df2Row["trip.start"] & df1$Date <= df2Row["trip.end"];
    if(sum(matches) > 0){
        # Had to transpose df2Row before doing data.frame(..).
        # or it would make a single column df with 6 rows instead!
        # Very weird...
        cbind(df1[matches, ], data.frame(t(df2Row)));
    } else {
        NULL;
    }
})

# Now bind all the dfs in the list output above into 
# a single data.frame
df12 <- do.call(rbind, df12);

有更好的方法可以使用dplyr library（非常值得学习！），但如果你想坚持基础R，那么上面就是你可以做到的一种方式。

R：循环到子集，附加数据和输出单个数据帧

1 个答案: