我遇到了一个连接问题,我想用于单独数据帧的连接ID分布在三个可能的ID列中。如果至少有一个ID匹配,我希望能够加入。我知道_join和merge函数接受列名称的向量,但是有条件地使这项工作可行吗?
例如,如果我有以下两个数据帧:
df_A <- data.frame(dta = c("FOO", "BAR", "GOO"),
id1 = c("abc", "", "bcd"),
id2 = c("", "", "xyz"),
id3 = c("def", "fgh", ""), stringsAsFactors = F)
df_B <- data.frame(dta = c("FUU", "PAR", "KOO"),
id1 = c("abc", "", ""),
id2 = c("", "xyz", "zzz"),
id3 = c("", "", ""), stringsAsFactors = F)
> df_A
dta id1 id2 id3
1 FOO abc def
2 BAR fgh
3 GOO bcd xyz
> df_B
dta id1 id2 id3
1 FUU abc
2 PAR xyz
3 KOO zzz
我希望最终得到这样的东西:
dta.x dta.y id1 id2 id3
1 FOO FUU abc "" def [matched on id1]
2 BAR "" "" "" fgh [unmatched]
3 GOO PAR bcd xyz "" [matched on id2]
4 KOO "" "" zzz "" [unmatched]
因此将保留不匹配的dta1和dta1变量,但是在有匹配项(上面的行1 + 3)的情况下,dta1和dta2都将加入新表中。我觉得_join,merge或match都不能按原样工作,并且我需要编写一个函数,但是我不确定从哪里开始。任何帮助或想法表示赞赏。谢谢
答案 0 :(得分:1)
基本上,您要做的是通过相应的ID进行联接,您可以做的是将原始ID列转换为id_column
和id_value
,因为您不想使用“ “,我放下了吗?
library(tidyverse)
df_A_long <- df_A %>%
pivot_longer(
cols = -dta,
names_to = "id_column",
values_to = "id_value"
) %>%
dplyr::filter(id_value != "")
df_B_long <- df_B %>%
pivot_longer(
cols = -dta,
names_to = "id_column",
values_to = "id_value"
) %>%
dplyr::filter(id_value != "")
我们总是使用id_column
和id_value
来加入A和B。
> df_B_long
# A tibble: 3 x 3
dta id_column id_value
<chr> <chr> <chr>
1 FUU id1 abc
2 PAR id2 xyz
3 KOO id2 zzz
连接部分很清楚,但是要创建所需的输出,我们需要进行一些数据整理以使其看起来相同。
df_joined <- df_A_long %>%
# join using id_column and id_value
full_join(df_B_long, by = c("id_column","id_value"),suffix = c("1","2")) %>%
# pivot back to long format
pivot_wider(
id_cols = c(dta1,dta2),
names_from = id_column,
values_from = id_value
) %>%
# if dta1 is missing, then in the same row, move value from dta2 to dta1
mutate(
dta1_has_value = !is.na(dta1), # helper column
dta1 = ifelse(dta1_has_value,dta1,dta2),
dta2 = ifelse(!dta1_has_value & !is.na(dta2),NA,dta2)
) %>%
select(-dta1_has_value) %>%
group_by(dta1) %>%
# condense multiple rows into one row
summarise_all(
~ifelse(all(is.na(.x)),"",.x[!is.na(.x)])
) %>%
# reorder columns
{
.[sort(colnames(df_joined))]
}
结果:
> df_joined
# A tibble: 4 x 5
dta1 dta2 id1 id2 id3
<chr> <chr> <chr> <chr> <chr>
1 BAR "" "" "" fgh
2 FOO FUU abc "" def
3 GOO PAR bcd xyz ""
4 KOO "" "" zzz ""
答案 1 :(得分:1)
num1 = 57;
num2 = 34;
while ( num1 > 0 ) {
digit1 = num1 % 10;
num1 = num1 / 10;
System.out.println("num1 digit: " + digit1);
while (num2 > 0 ) {
digit2 = num2 % 10;
System.out.println("num2 digit: " + digit2);
num2 = num2 / 10;
}
}