我正在为学生练习假设检验的数据集。该数据应包含生产建筑设备车辆的虚拟处理时间。车辆有不同的类型和不同的选择,可能影响处理时间。根据处理时间和机器规格,学生将研究哪些因素对处理时间有重大影响,并预测生产具有特定配置的特定机器所需的时间。
数据集的最终目标是生成每台机器的总处理时间。本质上,(总)处理时间应该是基本时间+选项1时间+选项2时间+选项3时间+等等的累积。每个选项都应从分布中随机抽样,以免过于明显。仅将总时间提供给学生,但是我需要选择时间来构建总时间。
我知道如何使用rnorm()和其他分布进行随机采样。但是我不知道如何仅根据列的内容有条件地生成数据。
数据集看起来像这样。
Machine <- c(1,2,3,4,5,6,7,8,9,10)
Pump.Option <- c("30 Liter", "40 Liter", "30 Liter", "30 Liter", "30 Liter", "30 Liter", "50 Liter", "30 Liter", "30 Liter", "40 Liter")
Piping.Option <- c("No special piping", "No special piping", "special piping", "No special piping", "special piping", "No special piping", "No special piping", "special piping", "special piping", "No special piping")
Lights.Option <- c("Std light", "Std & Addional", "Std & Addional","Std & Addional", "Std & Addional", "Std & Addional", "Std light", "Std & Addional", "Std & Addional", "Std & Addional")
Valve.Option <- c("Safety valve", "Safety valve", "Normal valve", "Normal valve", "Safety valve", "Normal valve", "Safety valve", "Safety valve", "Normal valve", "Safety valve")
Pump.Time <- NA
Piping.Time <- NA
Lights.Time <- NA
Valve.Time <- NA
Total.Time <- NA
DF.Sample <- data.frame(Machine, Pump.Option, Piping.Option, Lights.Option, Valve.Option, Pump.Time, Piping.Time, Lights.Time, Valve.Time, Total.Time)
基于列Pump.Option,Piping.Option和Lights.Option的内容,需要生成的时间是Pump.Time,Piping.Time和Lights.Time。这些时间将用于计算该机器的总时间。
选项的时间是这样的。
答案 0 :(得分:0)
您可以为此使用dplyr的case_when
,与一组嵌套的ifelse
语句相比,它提供了一种相对干净的语法:
library(dplyr)
DF.Sample %>%
mutate(Pump.Time = case_when(
Pump.Option == "30 Liter" ~ 0,
Pump.Option == "40 Liter" ~ rnorm(n(), mean = 10, sd = 4),
Pump.Option == "50 Liter" ~ rnorm(n(), mean = 20, sd = 10)
),
Piping.Time = case_when(
Piping.Option == "No special piping" ~ 0,
Piping.Option == "special piping" ~ rnorm(n(), mean = 10, sd = 4)
),
Lights.Time = case_when(
Lights.Option == "Std light" ~ 0,
Lights.Option == "Std & Additional" ~ rnorm(n(), mean = 10, sd = 4)
)
)
#> Machine Pump.Option Piping.Option Lights.Option Valve.Option
#> 1 1 30 Liter No special piping Std light Safety valve
#> 2 2 40 Liter No special piping Std & Additional Safety valve
#> 3 3 30 Liter special piping Std & Additional Normal valve
#> 4 4 30 Liter No special piping Std & Additional Normal valve
#> 5 5 30 Liter special piping Std & Additional Safety valve
#> 6 6 30 Liter No special piping Std & Additional Normal valve
#> 7 7 50 Liter No special piping Std light Safety valve
#> 8 8 30 Liter special piping Std & Additional Safety valve
#> 9 9 30 Liter special piping Std & Additional Normal valve
#> 10 10 40 Liter No special piping Std & Additional Safety valve
#> Pump.Time Piping.Time Lights.Time
#> 1 0.000000 0.000000 0.000000
#> 2 4.956528 0.000000 17.716970
#> 3 0.000000 11.051394 10.142101
#> 4 0.000000 0.000000 11.886158
#> 5 0.000000 15.291671 6.745524
#> 6 0.000000 0.000000 5.228694
#> 7 21.520437 0.000000 0.000000
#> 8 0.000000 9.777887 9.222347
#> 9 0.000000 11.219067 14.726647
#> 10 12.761031 0.000000 6.111458
数据
DF.Sample <- data.frame(
Machine = c(1,2,3,4,5,6,7,8,9,10),
Pump.Option = c("30 Liter", "40 Liter", "30 Liter", "30 Liter", "30 Liter", "30 Liter", "50 Liter", "30 Liter", "30 Liter", "40 Liter"),
Piping.Option = c("No special piping", "No special piping", "special piping", "No special piping", "special piping", "No special piping", "No special piping", "special piping", "special piping", "No special piping"),
Lights.Option = c("Std light", "Std & Additional", "Std & Additional","Std & Additional", "Std & Additional", "Std & Additional", "Std light", "Std & Additional", "Std & Additional", "Std & Additional"),
Valve.Option = c("Safety valve", "Safety valve", "Normal valve", "Normal valve", "Safety valve", "Normal valve", "Safety valve", "Safety valve", "Normal valve", "Safety valve")
)