我有像这样的巨大数据框:
df <- read.table(text="
id date
1 1 2016-12-01
2 2 2016-12-02
3 4 2017-01-03
4 6 2016-11-04
5 7 2017-11-05
6 9 2017-12-06", header=TRUE)
我为每个id随机生成1或0。我正在用这段代码。
set.seed(5)
df %>%
arrange(id) %>%
mutate(
rn = runif(id),
discount = if_else(rn < 0.5, 0, 1)
)
在我向数据框添加新行之前,它可以完美工作。那我的随机数不一样吗?
但是我需要的不仅仅是为每个id生成随机数,而且即使添加新行,该数字也必须保持不变。
这意味着:
id date discount
1 1 2016-12-01 1
2 2 2016-12-02 0
3 4 2017-01-03 0
4 6 2016-11-04 1
5 7 2017-11-05 1
6 9 2017-12-06 1
添加新行时
id date discount
1 1 2016-12-01 1
2 2 2016-12-02 0
3 4 2017-01-03 0
4 6 2016-11-04 1
5 7 2017-11-05 1
6 9 2017-12-06 1
7 12 2017-12-06 0
8 13 2017-12-06 1
答案 0 :(得分:1)
您需要在“新” seed
“呼叫”之前重设相同的data.frame
:
set.seed(5) # first call
df %>%
arrange(id) %>%
mutate(
rn = runif(id),
discount = if_else(rn < 0.5, 0, 1)
)
# id date rn discount
# 1 1 2016-12-01 0.2002145 0
# 2 2 2016-12-02 0.6852186 1
# 3 4 2017-01-03 0.9168758 1
# 4 6 2016-11-04 0.2843995 0
# 5 7 2017-11-05 0.1046501 0
# 6 9 2017-12-06 0.7010575 1
set.seed(5) # added two rows, reset the seed
df2 %>%
arrange(id) %>%
mutate(
rn = runif(id),
discount = if_else(rn < 0.5, 0, 1)
)
# id date rn discount
# 1 1 2016-12-01 0.2002145 0
# 2 2 2016-12-02 0.6852186 1
# 3 4 2017-01-03 0.9168758 1
# 4 6 2016-11-04 0.2843995 0
# 5 7 2017-11-05 0.1046501 0
# 6 9 2017-12-06 0.7010575 1
# 7 12 2017-12-06 0.5279600 1
# 8 13 2017-12-06 0.8079352 1
数据:
df <- read.table(text="
id date
1 1 2016-12-01
2 2 2016-12-02
3 4 2017-01-03
4 6 2016-11-04
5 7 2017-11-05
6 9 2017-12-06", header=TRUE)
df2 <- read.table(text="
id date
1 1 2016-12-01
2 2 2016-12-02
3 4 2017-01-03
4 6 2016-11-04
5 7 2017-11-05
6 9 2017-12-06
7 12 2017-12-06
8 13 2017-12-06", header=TRUE)