我有类似这样的数据。
investor_name funding_round_type count
<chr> <chr> <int>
1 .406 Ventures angel 1
2 .406 Ventures other 2
3 .406 Ventures private-equity 1
4 .406 Ventures series-a 5
5 .406 Ventures series-b 2
6 .406 Ventures series-c+ 7
7 .406 Ventures venture 1
8 500 Startups angel 40
我想替换funding_round_type
等于venture
的所有实例,并用series-a
,series-b
或series-c+
替换它。我想随机选择其中一个,前两个概率为40%,最后一个概率为20%。
my_df %>%
mutate(funding_round_type = ifelse(funding_round_type == "venture",
sample(c("series-a", "series-b", "series-c"), 1, replace = TRUE, prob = c(.4, .4, .2)),
funding_round_type))
奇怪的是,sample()
似乎只选择了一次,然后又恢复为每一行的选定值。我已经运行了几次,它仅用我的选项列表中的一个值替换了venture
,并且不包含任何其他值的实例。
如何使sample()在每一行上重新运行?
答案 0 :(得分:2)
这是因为ifelse
仅运行一次sample
函数,并且您从中选择一个值,该值将为其他每个值循环使用。尝试做
library(dplyr)
my_df %>%
mutate(funding_round_type = ifelse(funding_round_type == "venture",
sample(c("series-a", "series-b", "series-c"),
sum(funding_round_type == "venture"),replace = TRUE, prob = c(.4, .4, .2)),
funding_round_type))
或与replace
my_df %>%
mutate(funding_round_type = replace(funding_round_type,
funding_round_type == "venture", sample(c("series-a", "series-b", "series-c"),
sum(funding_round_type == "venture"), replace = TRUE, prob = c(.4, .4, .2))))
您也可以直接替换它,而无需任何ifelse
或任何包。
my_df$funding_round_type[my_df$funding_round_type == "venture"] <-
with(my_df, sample(c("series-a", "series-b", "series-c"),
sum(funding_round_type == "venture"), replace = TRUE, prob = c(.4, .4, .2)))
答案 1 :(得分:0)
使用rowwise()
将为每一行重新采样:
df %>%
rowwise %>%
mutate(funding_round_type = if_else(
funding_round_type == "venture",
sample(c("series-a", "series-b", "series-c+"), 1, prob = c(.4, .4, .2)),
funding_round_type))
也是-次要的,但您不需要replace=TRUE
,因为每次调用sample()
仅提取一个样本。
答案 2 :(得分:0)
我们可以使用data.table
方法
library(data.table)
setDT(df)[funding_round_type == "venture", funding_round_type :=
sample(c("series-a", "series-b", "series-c+"), 1, prob = c(.4, .4, .2))][]
# investor_name funding_round_type count
#1: .406 Ventures angel 1
#2: .406 Ventures other 2
#3: .406 Ventures private-equity 1
#4: .406 Ventures series-a 5
#5: .406 Ventures series-b 2
#6: .406 Ventures series-c+ 7
#7: .406 Ventures series-b 1
#8: 500 Startups angel 40
或使用case_when
中的tidyverse
library(tidyerse)
df %>%
mutate(funding_round_type = case_when(funding_round_type == "venture" ~
sample(c("series-a", "series-b", "series-c+"), 1, prob = c(.4, .4, .2)),
TRUE ~ funding_round_type))
# investor_name funding_round_type count
#1 .406 Ventures angel 1
#2 .406 Ventures other 2
#3 .406 Ventures private-equity 1
#4 .406 Ventures series-a 5
#5 .406 Ventures series-b 2
#6 .406 Ventures series-c+ 7
#7 .406 Ventures series-a 1
#8 500 Startups angel 40
df <- structure(list(investor_name = c(".406 Ventures", ".406 Ventures",
".406 Ventures", ".406 Ventures", ".406 Ventures", ".406 Ventures",
".406 Ventures", "500 Startups"), funding_round_type = c("angel",
"other", "private-equity", "series-a", "series-b", "series-c+",
"venture", "angel"), count = c(1L, 2L, 1L, 5L, 2L, 7L, 1L, 40L
)), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8"))