我有一个如下所示的数据框:
S1State S1Value S2State S2Value
NSW 20 VIC 30
WA 30 NSW 20
我想过滤并选择具有最大值的状态(来自S1State和S2State)(来自S1Value和S2Value)。结果应如下所示:
SState SValue
VIC 30
WA 30
我是R的新手并且一直在尝试使用dplyr。
答案 0 :(得分:2)
我暗示的答案如下:
library(dplyr)
dt <- read.table(text = "S1State S1Value S2State S2Value
NSW 20 VIC 30
WA 30 NSW 20",
header = TRUE, stringsAsFactors = FALSE)
answer = dt %>%
mutate(SState = ifelse(S1Value > S2Value, S1State, S2State),
SValue = ifelse(S1Value > S2Value, S1Value, S2Value)) %>%
select(SState, SValue)
答案 1 :(得分:2)
只是为了表明使用标准R工具远非不可能:
nams <- c("State","Value")
tmp <- reshape(dt, direction="long", varying=lapply(nams, grep, x=names(dt)),
v.names=nams, timevar=NULL)
tmp[with(tmp, Value == ave(Value, id, FUN=max)),]
# State Value id
#2.1 WA 30 2
#1.2 VIC 30 1
答案 2 :(得分:1)
我假设OP可能在数据框中有更多状态,例如S3State
,S4State
,...
以下解决方案基于此假设,试图能够处理多个状态。如果只有两种状态,@lebelinoz提出的方法简单明了。
使用dplyr
和tidyr
中的函数的解决方案。 dt2
是最终输出。
# Load packages
library(dplyr)
library(tidyr)
# Process the data
dt2 <- dt %>%
gather(Num, Value, contains("Value")) %>%
gather(State, Name, contains("State")) %>%
# Only keep records with the same state number
filter(substring(Num, 1, 2) == substring(State, 1, 2)) %>%
mutate(Group = substring(Num, 1, 2)) %>%
group_by(Group) %>%
filter(Value == max(Value)) %>%
ungroup() %>%
select(SState = Name, SSValue = Value)
使用dplyr
,purrr
和stringr
中的函数的解决方案。我为前两个软件包加载了包tidyverse
。同样,dt2
是最终输出。
# Load packages
library(tidyverse)
library(stringr)
# Extract the column names
Col <- colnames(dt)
# Extract state numbers
ColNum <- Col %>%
str_extract(pattern = "[0-9]") %>%
unique()
# Design a function to process the data
dt_process <- function(pattern, dt){
dt2 <- dt %>%
# Extract columns based on a pattern (numbers)
select(dplyr::contains(pattern)) %>%
# Rename the columns
rename_all(~sub(pattern, "", .)) %>%
# Filter the maximum row
filter(SValue == max(SValue))
return(dt2)
}
# Apply the dt_process function
dt_list <- map(.x = ColNum, .f = dt_process, dt = dt)
# Bind all data frames
dt2 <- bind_rows(dt_list) %>% arrange(SState)
# Create example data frame
dt <- read.table(text = "S1State S1Value S2State S2Value
NSW 20 VIC 30
WA 30 NSW 20",
header = TRUE, stringsAsFactors = FALSE)