Question

我有一个名为df的大型数据框，带有一些ID。

我有另一个数据框（id_list），其中包含一组匹配的ID及其与每个ID关联的功能。随后未在两个数据帧中对ID进行排序。

有效地，我想从较大的数据帧df查找到id_list，并向当前数据帧{{1}添加两列，即Display和Type }。

有许多令人困惑的例子。什么可能是最有效的方法。我尝试使用df，match()失败了。

这是一个可复制的示例。

%in%

最终的df大小应为[20 x 8]。

感谢您的帮助。

Answer 1

如果可以确保数据帧df和id_list具有相同的ID（仅以不同的顺序），则可以尝试以下操作：

# define new data frame
orderd_id_list <- data.frame()

# loop over rows of df (get new ID each round)
for (i in seq(nrow(df))) {
  # find the row in id_list where the ID "id_list$ID" is identical to
  # current ID in df for this round "df$ID[i]"
  new_row <- id_list[id_list$ID == df$ID[i],]
  # add new row to orderd_id_list
  orderd_id_list <- rbind(orderd_id_list, new_row)
}

# merge (add columns) Display and Type columns of new orderd data frame with df
merged_df <- cbind(Display = orderd_id_list$Display, Type = orderd_id_list$Type, df)

在R中使用for循环并不是最佳方法，但是如果您的数据帧不太大，那就可以了。

基本上，您创建id_list的新的有序（基于df $ ID排序）副本，然后将其与df合并。

希望对您有所帮助：）

Answer 2

您可以使用基数R中的<table class="my-table"> <thead> <tr> <th>Account</th> <th>Note</th> <th>Time</th> </tr> </thead> <tbody> <tr> <td class="account">account one</td> <td class="note">1234567890</td> <td class="time">7/10/2018 <button class="interactive-btn">Button</button> </td> </tr> <tr> <td class="account">account two</td> <td class="note">abcdefghijklmn</td> <td class="time">7/10/2018 <button class="interactive-btn">Button</button> </td> </tr> </tbody> </table>或merge中的left_join来轻松完成此操作。（还有dplyr，也许其他人可以提供答案。）您可能想采取一些步骤，以确保如果数据框中有没有条目的条目，则不会丢失任何数据。查找中的相应ID。如果不是这种情况，您可以在data.table::merge中将all.x更改为false或null，或者从merge切换到left_join。为了说明这一点，我在数据表中添加了一个虚拟行，其ID在查询表中不存在。

inner_join

使用df <- data.frame(Feats = matrix(rnorm(10), nrow = 5, ncol = 5), ID = sample.int(10, 10)) dummy <- df[1, ] dummy$ID <- 12 df <- rbind(dummy, df) id_list <- data.frame(ID = sample.int(10,10), Display = sample(c('clear', 'blur'), 10, replace = TRUE), Type = sample(c('red', 'green', 'blue', 'indigo', 'yellow'), 10, replace = TRUE))，您可以将merge设置为要连接的两个数据框中的列名，或者将by和by.x设置为不同的名称。 by.y会将所有观测值保留在第一个数据帧中，即使它们与第二个数据帧中的观测值不匹配。

all.x = T

merged1 <- merge(df, id_list, by = "ID", sort = F, all.x = T) merged1 #> ID Feats.1 Feats.2 Feats.3 Feats.4 Feats.5 Display #> 1 10 -1.44053344 1.0086988 -1.44053344 1.0086988 -1.44053344 clear #> 2 5 0.99220217 -0.3125813 0.99220217 -0.3125813 0.99220217 clear #> 3 2 1.03881289 1.1277627 1.03881289 1.1277627 1.03881289 clear #> 4 7 -0.01678186 -0.1519029 -0.01678186 -0.1519029 -0.01678186 clear #> 5 4 0.07130125 1.1715833 0.07130125 1.1715833 0.07130125 clear #> 6 6 -1.44053344 1.0086988 -1.44053344 1.0086988 -1.44053344 clear #> 7 8 0.99220217 -0.3125813 0.99220217 -0.3125813 0.99220217 blur #> 8 3 1.03881289 1.1277627 1.03881289 1.1277627 1.03881289 clear #> 9 1 -0.01678186 -0.1519029 -0.01678186 -0.1519029 -0.01678186 clear #> 10 9 0.07130125 1.1715833 0.07130125 1.1715833 0.07130125 clear #> 11 12 -1.44053344 1.0086988 -1.44053344 1.0086988 -1.44053344 <NA> #> Type #> 1 indigo #> 2 yellow #> 3 blue #> 4 indigo #> 5 yellow #> 6 indigo #> 7 green #> 8 red #> 9 red #> 10 blue #> 11 <NA>保留来自第一个数据帧的所有观察结果，并合并来自第二个数据帧的所有匹配结果。

dplyr::left_join

由reprex package（v0.2.0）于2018-07-13创建。

从R中的另一个数据框中查找值

2 个答案: