请不要再看这个部分@从这里开始
我正在尝试合并以下两行:
像这样进入一行:
以下是创建数据集的代码:
dataset <- data.frame(Environment=c("PRODUCTION","PRODUCTION"),
Green=c("Yes","No"),
Red=c("No","Yes"),
Completed=c("Yes","Yes"))
如果Environment
列具有相同的值,则在这种情况下PRODUCTION
合并两行并返回“是”。我没有包含代码导致我尝试的所有代码都无效。例如,此代码负责复制:
dataset[!duplicated(dataset$Environment),]
任何帮助将不胜感激。
从这里开始 - 问题更新
我意识到我的问题并没有反映出我想要解决的问题。让我再尝试一次。这是数据集:
我希望它如此:
可能还有很多其他专栏。但是,我想要做的就是,如果同一个ID
有相同的Environment
组合它们,并返回Yes
,如果有Yes
其他返回默认值。我希望我的措辞要好得多。
这是新数据集:
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
"PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
Green=c("Yes","No", "Yes","Yes","No", "Yes", "Yes"),
Red=c("No","Yes", "No","No","Yes", "No", "No"),
Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))
基于@ P.Routh代码,我认为我们更近了一步。我修改了数据集以显示静态签名将破坏代码:
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
"PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
Red=c("No","Yes", "No","No","Yes", "No", "No"),
White=c("No","No", "No","No","No", "No", "No"),
Black=c("No","No", "No","No","No", "No", "No"),
Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))
来自@ P.Routh的修改后的代码输出错误:
df <- dataset%>%group_by(ID,Environment)%>%
mutate(total = n())%>% #this counter acts as the condition you need
unite(signature,Green,Red,White,Black,Completed,sep = ":")%>% #combines the columns into one column
mutate(dummy = "Yes:Yes:Yes:Yes:Yes")%>% #just a dummy column to faciliate in specifying the condition
mutate(new_val = ifelse(total>1,dummy,signature))%>% #this is the condition
select(-signature:-dummy)%>%
separate(new_val, c("Green","Red","White","Black","Completed"),":") #restores original output
unique(df)
答案 0 :(得分:4)
使用dplyr
和zoo
第一种方法
dataset[dataset=='No']=NA
dataset%>%group_by(Environment)%>%mutate_each(funs(na.locf))%>%filter(row_number()==n())
Environment Green Red Completed
<fctr> <fctr> <fctr> <fctr>
1 PRODUCTION Yes Yes Yes
第二种方法 来自@ eipi10
dataset %>% group_by(Environment) %>% summarise_all(funs(max(as.character(.))))
#For the detail
#'Yes'>'No'
#[1] TRUE
#max('Yes','No')
#[1] "Yes"
答案 1 :(得分:3)
在基础R中,您可以像这样使用aggregate
。
aggregate(dataset[-1], dataset["Environment"], function(x) max(as.character(x)))
返回
Environment Green Red Completed
1 PRODUCTION Yes Yes Yes
在我回答之后,这个问题似乎发生了变化。但是,对我原始代码的一个小改动会产生所需的输出(稍微进行行重组)
aggregate(dataset[-(1:2)], dataset[c("Environment", "ID")],
function(x) max(as.character(x)))
请注意,这假定字符是按顺序排列的,以便按字典顺序排除故障。如果相反,您可以采取最小值。其次,在这种情况下,使用数字代码而不是文本更容易。第二种解决方案是将文本转换为数字以执行上述操作。
答案 2 :(得分:0)
使用dplyr
的解决方案。关键是为除Environment
之外的所有列指定因子级别。之后,汇总min
的列。 mutate_at
和summarise_at
可以有效地完成此任务。
# Load package
library(dplyr)
# Process the data
dataset2 <- dataset %>%
# Set factor level to all columns except Environment
mutate_at(vars(-Environment), factor, levels = c("Yes", "No"), ordered = TRUE) %>%
group_by(Environment) %>%
summarise_all(funs(min(.)))
答案 3 :(得分:0)
我希望现在还为时不晚。我的解决方案使用dplyr
和tidyr
library(dplyr)
library(tidyr)
df <- dataset%>%group_by(ID,Environment)%>%
mutate(total = n())%>% #this counter acts as the condition you need
unite(signature,Green,Red,Completed,sep = ":")%>% #combines the columns into one column
mutate(dummy = "Yes:Yes:Yes")%>% #just a dummy column to faciliate in specifying the condition
mutate(new_val = ifelse(total>1,dummy,signature))%>% #this is the condition
select(-signature:-dummy)%>%
separate(new_val, c("Green","Red","Completed"),":") #restores original output
unique(df)
答案 4 :(得分:0)
感谢@ P.Routh,@ Win和@ eipi10。我接受了你的所有想法,并提出了实际使用我的大型数据集的工作代码。以下是上面发布的数据集和有效的代码:
#load library
library(dplyr)
#create dataframe
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
"PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
Red=c("No","Yes", "No","No","Yes", "No", "No"),
White=c("No","No", "No","No","No", "No", "No"),
Black=c("No","No", "No","No","No", "No", "No"),
Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))
df <- dataset%>%group_by(ID,Environment)%>% mutate(total = n())#add column total for counter of duplicates
ddc<-df[df$total==1,]#subsets those without duplicates
ddd<-df[df$total==2,]#subsets those with duplicates
ddd<- ddd %>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.))))
merge(ddc, ddd, all=TRUE)
谢谢大家。
答案 5 :(得分:0)
感谢@ P.Routh,@ Win和@ eipi10。我接受了你的所有想法,并提出了实际使用我的大型数据集的工作代码。以下是上面发布的数据集和有效的代码:
#load library
library(dplyr)
#create dataframe
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
"PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
Red=c("No","Yes", "No","No","Yes", "No", "No"),
White=c("No","No", "No","No","No", "No", "No"),
Black=c("No","No", "No","No","No", "No", "No"),
Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))
df <- dataset%>%group_by(ID,Environment)%>% mutate(total = n())#add column total for counter of duplicates
ddc<-df[df$total==1,]#subsets those without duplicates
ddd<-df[df$total==2,]#subsets those with duplicates
ddd<- ddd %>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.))))
merge(ddc, ddd, all=TRUE)
谢谢大家。
更新
我更多地考虑了这一点,并意识到我不需要中间的所有其他步骤来折叠行。如果您提供唯一标识符,则会保留您的数据完整性,例如group_by(ID, Environment)
。我进一步修改了数据集以测试它。请参阅下面的新解决方案:
dataset <- data.frame(ID=c(15,15,15,15,16,16,16,16),
Environment=c("PRODUCTION","PRODUCTION","PRODUCTION", "TRAINING",
"PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
Green=c("Yes","No", "Yes", "Yes","Yes","No", "No", "Yes"),
Red=c("No","Yes", "No", "No","No","Yes", "No", "No"),
White=c("No","No", "Yes","Yes","No","No", "No", "No"),
Black=c("No","No", "No","No","No","No", "No", "No"),
Completed=c("Yes","Yes", "No","No","Yes","Yes", "No", "No"))
dataset%>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.))))