通过子集包含某些字母的行来向df添加因子

时间:2019-10-04 16:06:25

标签: r

这是我的数据

  Year variable value
 1951     MF12 1.441
 1952     MF12 2.068
 1953     RF12 2.008  
 1954     RF12 2.044
 1955     MW12 2.288
 1956     RW12 1.800

其中MF =托管帧,RF =备用帧,MW =托管风,RW =备用风。因此,共有4个不同的级别=托管,保留,框架,风。

我想基于这些级别创建两种类型的因子,并将它们作为列添加到数据框中。因子1将为management.type(托管,保留),因子2将为object.type(框架,风)。

类似这样的东西:

Year variable value Management Object
1951   MF12 1.37845 Managed      Frame 
1952   MF12 1.38950 Managed      Frame
1953   MW12 1.55510 Managed      Wind
1954   RF12 1.66125 Reserve      Frame
1955   RW12 1.62600 Reserve      Wind
1956   RW13 1.58760 Reserve      Wind

如何使用R(而不是返回并在excel中排序)来做到这一点?我认为就管理类型而言,也许可以使用start.with命令以'M'或'R'开头进行排序,但是不确定如何做到这一点。就对象而言,有没有一种方法可以对包含字母“ F”或“ W”的单词进行排序?

3 个答案:

答案 0 :(得分:1)

尝试library(tidyverse) theme_set(theme_minimal()) df <- tibble( year = as.character(c(2015, 2016)), v1 = c(3,10), v2 = c(7,18)) df$year <- as.Date(df$year, "%Y") format(df$year, "%Y") #> [1] "2015" "2016" df2 <- df %>% gather(key = "variable", value = "value", -year) ggplot(df2, aes(x = year, y = value)) + geom_line(aes(color = variable, linetype = variable)) + scale_color_manual(values = c("darkred", "steelblue")) grepl()

ifelse()

答案 1 :(得分:1)

使用

    case_when()
  • dplyrifelse()的优势在于,处理两个以上的案例非常容易管理。
  • substr()提取第一个字母,然后提取第二个字母,对于使用某些正则表达式进行更复杂的检查grepl()可能是必要的。
df$Management <- dplyr::case_when(
  substr(df$variable, 1, 1) == "M" ~ "Managed",
  substr(df$variable, 1, 1) == "R" ~ "Reserved",
)

df$Object <- dplyr::case_when(
  substr(df$variable, 2, 2) == "F" ~ "Frame",
  substr(df$variable, 2, 2) == "W" ~ "Wind",
)

df
  Year variable value Management Object
1 1951     MF12 1.441    Managed  Frame
2 1952     MF12 2.068    Managed  Frame
3 1953     RF12 2.008   Reserved  Frame
4 1954     RF12 2.044   Reserved  Frame
5 1955     MW12 2.288    Managed   Wind
6 1956     RW12 1.800   Reserved   Wind

可复制的数据:

df <- data.frame(
  Year = 1951:1956, 
  variable = c("MF12", "MF12", "RF12", "RF12", "MW12", "RW12"), 
  value = c(1.441, 2.068, 2.008, 2.044, 2.288, 1.8),
  stringsAsFactors = FALSE
)

答案 2 :(得分:1)

我们可以使用

library(dplyr)
df  %>%
    mutate(Management = factor(str_extract(variable, "^."),
          levels = c("M", "R"), labels = c("Managed", "Reserved")), 
          Object = factor(str_extract(variable, "(?<=^.)."), 
          levels = c("F", "W"), labels = c("Frame", "Wind")))
#   Year variable value Management Object
#1 1951     MF12 1.441    Managed  Frame
#2 1952     MF12 2.068    Managed  Frame
#3 1953     RF12 2.008   Reserved  Frame
#4 1954     RF12 2.044   Reserved  Frame
#5 1955     MW12 2.288    Managed   Wind
#6 1956     RW12 1.800   Reserved   Wind