这是我的数据
Year variable value
1951 MF12 1.441
1952 MF12 2.068
1953 RF12 2.008
1954 RF12 2.044
1955 MW12 2.288
1956 RW12 1.800
其中MF =托管帧,RF =备用帧,MW =托管风,RW =备用风。因此,共有4个不同的级别=托管,保留,框架,风。
我想基于这些级别创建两种类型的因子,并将它们作为列添加到数据框中。因子1将为management.type(托管,保留),因子2将为object.type(框架,风)。
类似这样的东西:
Year variable value Management Object
1951 MF12 1.37845 Managed Frame
1952 MF12 1.38950 Managed Frame
1953 MW12 1.55510 Managed Wind
1954 RF12 1.66125 Reserve Frame
1955 RW12 1.62600 Reserve Wind
1956 RW13 1.58760 Reserve Wind
如何使用R(而不是返回并在excel中排序)来做到这一点?我认为就管理类型而言,也许可以使用start.with
命令以'M'或'R'开头进行排序,但是不确定如何做到这一点。就对象而言,有没有一种方法可以对包含字母“ F”或“ W”的单词进行排序?
答案 0 :(得分:1)
尝试library(tidyverse)
theme_set(theme_minimal())
df <- tibble(
year = as.character(c(2015, 2016)),
v1 = c(3,10),
v2 = c(7,18))
df$year <- as.Date(df$year, "%Y")
format(df$year, "%Y")
#> [1] "2015" "2016"
df2 <- df %>%
gather(key = "variable", value = "value", -year)
ggplot(df2, aes(x = year, y = value)) +
geom_line(aes(color = variable, linetype = variable)) +
scale_color_manual(values = c("darkred", "steelblue"))
和grepl()
:
ifelse()
答案 1 :(得分:1)
使用
case_when()
的dplyr
比ifelse()
的优势在于,处理两个以上的案例非常容易管理。 substr()
提取第一个字母,然后提取第二个字母,对于使用某些正则表达式进行更复杂的检查grepl()
可能是必要的。df$Management <- dplyr::case_when(
substr(df$variable, 1, 1) == "M" ~ "Managed",
substr(df$variable, 1, 1) == "R" ~ "Reserved",
)
df$Object <- dplyr::case_when(
substr(df$variable, 2, 2) == "F" ~ "Frame",
substr(df$variable, 2, 2) == "W" ~ "Wind",
)
df
Year variable value Management Object
1 1951 MF12 1.441 Managed Frame
2 1952 MF12 2.068 Managed Frame
3 1953 RF12 2.008 Reserved Frame
4 1954 RF12 2.044 Reserved Frame
5 1955 MW12 2.288 Managed Wind
6 1956 RW12 1.800 Reserved Wind
可复制的数据:
df <- data.frame(
Year = 1951:1956,
variable = c("MF12", "MF12", "RF12", "RF12", "MW12", "RW12"),
value = c(1.441, 2.068, 2.008, 2.044, 2.288, 1.8),
stringsAsFactors = FALSE
)
答案 2 :(得分:1)
我们可以使用
library(dplyr)
df %>%
mutate(Management = factor(str_extract(variable, "^."),
levels = c("M", "R"), labels = c("Managed", "Reserved")),
Object = factor(str_extract(variable, "(?<=^.)."),
levels = c("F", "W"), labels = c("Frame", "Wind")))
# Year variable value Management Object
#1 1951 MF12 1.441 Managed Frame
#2 1952 MF12 2.068 Managed Frame
#3 1953 RF12 2.008 Reserved Frame
#4 1954 RF12 2.044 Reserved Frame
#5 1955 MW12 2.288 Managed Wind
#6 1956 RW12 1.800 Reserved Wind