我有3个数据帧。
Data1 -
Name_description Numbers
ABC 23
DEF 34
GHI 45
XYZ 43
JVK 23
LMN 21
数据2只有名称列表
Data 2-
Names
ABC
DEF
GHI
XYZ
JVK
LMN
PQR
KJL
数据3再次具有名称和数字
Data 3
Name_desc Numbers
ABC 56
DEF 67
GHI 89
XYZ 60
JVK 88
LMN 65
PQR 100
KJL 85
我想做以下事情 -
Look for all names from data 2 are present in data 1
If any names are missing then
{
get those names
get the numbers for those missing names from data 3
append above two things (missing names & numbers) to data 1
}
else
{data1<-data1
}
我只是想合并文件,但我还需要确保如果数据2中的数据1中没有名称丢失,那么数据1应保持不变。 (上面代码中提到的相同内容)
在上述情况下,我的最终输出应为
Data 1-
Name_description Numbers
ABC 23
DEF 34
GHI 45
XYZ 43
JVK 23
LMN 21
PQR 100
KJL 85
谢谢
答案 0 :(得分:1)
首先,合并Data2
和NA
,然后在这个新的data.frame中找到Data3
并将它们与Data3
匹配,最后用{{替换它们1}}值。
> tmp <- merge(Data1, Data2, by.x="Name_description", by.y="Names", all=TRUE)
> ind <- match(tmp$Name_description[is.na(tmp$Numbers)], Data3$Name_desc)
> tmp$Numbers[ind] <- Data3$Numbers[ind]
> tmp
Name_description Numbers
1 ABC 23
2 DEF 34
3 GHI 45
4 JVK 23
5 LMN 21
6 XYZ 43
7 KJL 100
8 PQR 85
答案 1 :(得分:1)
我发现dplyr::coalesce
在OP
提到的情况下非常方便。加入3个数据框后,可以使用Numbers
列NA
列(包含coalesce
),可以使用library(dplyr)
Data1 %>% full_join(Data2, by=c("Name_description" = "Names")) %>%
inner_join(Data3, by=c("Name_description" = "Name_desc")) %>%
mutate(Numbers = coalesce( Numbers.x, Numbers.y)) %>%
select(Name_description, Numbers)
# Name_description Numbers
# 1 ABC 23
# 2 DEF 34
# 3 GHI 45
# 4 XYZ 43
# 5 JVK 23
# 6 LMN 21
# 7 PQR 100
# 8 KJL 85
合并为:
Data1 <- read.table(text =
"Name_description Numbers
ABC 23
DEF 34
GHI 45
XYZ 43
JVK 23
LMN 21",
header = TRUE, stringsAsFactors = FALSE)
Data2 <- read.table(text =
"Names
ABC
DEF
GHI
XYZ
JVK
LMN
PQR
KJL",
header = TRUE, stringsAsFactors = FALSE)
Data3 <- read.table(text =
"Name_desc Numbers
ABC 56
DEF 67
GHI 89
XYZ 60
JVK 88
LMN 65
PQR 100
KJL 85",
header = TRUE, stringsAsFactors = FALSE)
数据:强>
public static void main(String[] args) {
LinkedList<String> list = new LinkedList<>();//declare your list
Scanner input = new Scanner(System.in);//create a scanner
System.out.println("How many participants? ");
int nbr = input.nextInt();//read the number of element
input.nextLine();
do {
System.out.println("What is the name of the people?");
list.add(input.nextLine());//read and insert into your list in one shot
nbr--;//decrement the index
} while (nbr > 0);//repeat until the index will be 0
System.out.println(list);//print your list
答案 2 :(得分:0)
使用dplyr,它应该类似于:
data1 %>%
bind_rows(
data2 %>%
anti_join(data1) %>%
left_join(data3)
)
答案 3 :(得分:0)
我们可以使用dplyr
和left_join
在ifelse
中实现这一目标。
library(dplyr)
Data4 <- Data2 %>%
left_join(Data1, by = c("Names" = "Name_description")) %>%
left_join(Data3, by = c("Names" = "Name_desc")) %>%
mutate(Numbers = ifelse(is.na(Numbers.x), Numbers.y, Numbers.x)) %>%
select(Names, Numbers)
Data4
# Names Numbers
# 1 ABC 23
# 2 DEF 34
# 3 GHI 45
# 4 XYZ 43
# 5 JVK 23
# 6 LMN 21
# 7 PQR 100
# 8 KJL 85
数据强>
Data1 <- read.table(text = "Name_description Numbers
ABC 23
DEF 34
GHI 45
XYZ 43
JVK 23
LMN 21",
header = TRUE, stringsAsFactors = FALSE)
Data2 <- read.table(text = "Names
ABC
DEF
GHI
XYZ
JVK
LMN
PQR
KJL",
header = TRUE, stringsAsFactors = FALSE)
Data3 <- read.table(text = "Name_desc Numbers
ABC 56
DEF 67
GHI 89
XYZ 60
JVK 88
LMN 65
PQR 100
KJL 85",
header = TRUE, stringsAsFactors = FALSE)
答案 4 :(得分:0)
我们实际上根本不需要merge
,你想要的是Number
的第一个可用选择,从Data1
然后Data3
开始,当我Name
在Data2
而不在其他人中时,我想返回NA。
执行此操作的最快方法是使用data.table
,但我也会提供其他选项。
<强> data.table 强>
data.table::rbindlist
默认情况下不使用名称(use.names=FALSE
),因此在这种情况下非常方便。
library(data.table)
rbindlist(list(Data1,Data3,Data2))[,.SD[1,],by="Name_description"]
# 1: ABC 23
# 2: DEF 34
# 3: GHI 45
# 4: XYZ 43
# 5: JVK 23
# 6: LMN 21
# 7: PQR 100
# 8: KJL 85
tidyverse解决方案
.keep_all
dplyr::distinct
参数对于避免使用%>% filter(!duplicated(Names))
或%>% group_by(Names) %>% Slice(1)
的可读性较低非常有用。
library(tidyverse)
lst(Data1,Data3,cbind(Data2,NA)) %>%
map(setNames,c("Names","Numbers")) %>%
bind_rows %>%
distinct(Names,.keep_all = TRUE)
# Names Numbers
# 1 ABC 23
# 2 DEF 34
# 3 GHI 45
# 4 XYZ 43
# 5 JVK 23
# 6 LMN 21
# 7 PQR 100
# 8 KJL 85
基础解决方案
x <- do.call(rbind,lapply(list(Data1,Data3,cbind(Data2,NA)),setNames,c("Names","Numbers")))
x[!duplicated(x[[1]]),]
# Names Numbers
# 1 ABC 23
# 2 DEF 34
# 3 GHI 45
# 4 XYZ 43
# 5 JVK 23
# 6 LMN 21
# 13 PQR 100
# 14 KJL 85