假设我有两个这样的数据帧:
df1 <- data.frame(state = c("ME", "TX", "CA", "NY"),
city = c("Bangor", "Austin", "Sacramento", "New York"),
district = c(4, 7, 19, 21))
df2 <- data.frame(state = c("MA", "WA", "NH", "FL"),
city = c("Boston", "Seattle", "Concord", "Tampa"),
population = c(2000000, 4000000, 80000, 2500000))
我希望对每个数据框进行子集化,以便仅保留两个数据帧之间匹配的列,如下所示:
df1 <- data.frame(state = c("ME", "TX", "CA", "NY"),
city = c("Bangor", "Austin", "Sacramento", "New York"))
df2 <- data.frame(state = c("MA", "WA", "NH", "FL"),
city = c("Boston", "Seattle", "Concord", "Tampa"))
我该怎么做?显然,真实数据集包含更多列,因此更可取的是通用方法。谢谢!
答案 0 :(得分:1)
使用intersect获取列的交集
col_extracted <- intersect(colnames(df1), colnames(df2))
df1 <- df1[,col_extracted]
df2 <- df2[,col_extracted]
答案 1 :(得分:1)
intersect
功能是你的朋友:
suppressPackageStartupMessages(library(tidyverse))
df1 <- data.frame(state = c("ME", "TX", "CA", "NY"),
city = c("Bangor", "Austin", "Sacramento", "New York"),
district = c(4, 7, 19, 21))
df2 <- data.frame(state = c("MA", "WA", "NH", "FL"),
city = c("Boston", "Seattle", "Concord", "Tampa"),
population = c(2000000, 4000000, 80000, 2500000))
common_names <- intersect(names(df1), names(df2))
select(df1, common_names)
#> state city
#> 1 ME Bangor
#> 2 TX Austin
#> 3 CA Sacramento
#> 4 NY New York
select(df2, common_names)
#> state city
#> 1 MA Boston
#> 2 WA Seattle
#> 3 NH Concord
#> 4 FL Tampa