如何仅保留数据框之间匹配的列?

时间:2017-09-10 00:30:00

标签: r

假设我有两个这样的数据帧:

df1 <- data.frame(state = c("ME", "TX", "CA", "NY"),
              city = c("Bangor", "Austin", "Sacramento", "New York"),
              district = c(4, 7, 19, 21))
df2 <- data.frame(state = c("MA", "WA", "NH", "FL"),
              city = c("Boston", "Seattle", "Concord", "Tampa"),
              population = c(2000000, 4000000, 80000, 2500000))

我希望对每个数据框进行子集化,以便仅保留两个数据帧之间匹配的列,如下所示:

df1 <- data.frame(state = c("ME", "TX", "CA", "NY"),
              city = c("Bangor", "Austin", "Sacramento", "New York"))
df2 <- data.frame(state = c("MA", "WA", "NH", "FL"),
              city = c("Boston", "Seattle", "Concord", "Tampa"))

我该怎么做?显然,真实数据集包含更多列,因此更可取的是通用方法。谢谢!

2 个答案:

答案 0 :(得分:1)

使用intersect获取列的交集

col_extracted <- intersect(colnames(df1), colnames(df2))

df1 <- df1[,col_extracted]
df2 <- df2[,col_extracted]

答案 1 :(得分:1)

intersect功能是你的朋友:

suppressPackageStartupMessages(library(tidyverse))

df1 <- data.frame(state = c("ME", "TX", "CA", "NY"),
                  city = c("Bangor", "Austin", "Sacramento", "New York"),
                  district = c(4, 7, 19, 21))

df2 <- data.frame(state = c("MA", "WA", "NH", "FL"),
                  city = c("Boston", "Seattle", "Concord", "Tampa"),
                  population = c(2000000, 4000000, 80000, 2500000))

common_names <- intersect(names(df1), names(df2))


select(df1, common_names)
#>   state       city
#> 1    ME     Bangor
#> 2    TX     Austin
#> 3    CA Sacramento
#> 4    NY   New York

select(df2, common_names)
#>   state    city
#> 1    MA  Boston
#> 2    WA Seattle
#> 3    NH Concord
#> 4    FL   Tampa