我想知道基于不一致的列合并数据集的最简化方法。
> head(team_measures)
# A tibble: 6 x 7
team_id geo_entropy job_entropy
<chr> <dbl> <dbl>
1 10012 1.79 1.79
2 10027 0 1.25
3 10044 1.79 0.650
4 10049 1.00 1.46
5 10053 0.811 2.00
> head(p_calc)
# A tibble: 6 x 2
team.id p_average
<int> <dbl>
1 10000 4.75
2 10001 4.98
3 10002 4.17
4 10003 4.32
5 10004 4.22
6 10005 4.44
我目前正在做的事情对于这样一个简单的过程非常牵强:
team_measures <- p_calc %>%
rename(team_id = team.id) %>%
select(team_id, p_average) %>%
left_join(team_measures, by = c('team_id')) %>%
na.omit()
实际上比这更糟,因为我得到了错误:
Error in left_join_impl(x, y, by_x, by_y, aux_x, aux_y, na_matches) : Can't join on 'team_id' x 'team_id' because of incompatible types (character / integer)
所以我必须将它们重铸为相同的类型。
有没有更简单的方法来实现这一目标?
答案 0 :(得分:2)
这是您要实现的目标吗?
dplyr联接具有一个by=
参数,其语法不是很直观,例如by = c("xxx" = "xxxx")
full_join(team_measures, p_cal, by = c("team_id" = "team.id") )
team_id geo_entropy job_entropy p_average
1 10012 1.790 1.79 NA
2 10027 0.000 1.25 NA
3 10044 1.790 0.65 NA
4 10049 1.000 1.46 NA
5 10053 0.811 2.00 NA
6 10000 NA NA 4.75
7 10001 NA NA 4.98
8 10002 NA NA 4.17
9 10003 NA NA 4.32
10 10004 NA NA 4.22
11 10005 NA NA 4.44
答案 1 :(得分:0)
只需将数字更改为字符
library(tidyverse)
data.frame(team_id = c("10012", '10027', '10044', '10049','10053'),
geo_entropy = c(1.79,0,1.79,1.00,0.811),
job_entropy = c(1.79,1.25,0.650,1.46,2.00)) -> team_measures
data.frame(team.id = 10000:10005,
p_average = c(4.75,4.98,4.17,4.32,4.22,4.44)) -> p_calc
p_calc %>%
mutate(team.id = as.character(team.id)) %>%
rename(team_id = team.id) %>%
left_join(team_measures)
这会为您NA
和geo_entropy
提供job_entropy
,因为您的team_id
值都不匹配。