如何在R中随机化数据集groupwize?

时间:2017-04-27 17:18:40

标签: r

我有同一个克隆的复制品。如何以每个克隆随机化的方式随机化我的数据集,但复制是否保持在一起?那么左栏变成了例如右栏?

      Clone  V2  V3  V4               Clone  V2  V3  V4
1    1201K_1 GS1  1 167        4   12419S_13 GS1  1 279
2    1201K_1 GS1  1 355        5   12419S_13 GS1  1 287
3    1201K_1 GS1  1 515        9    12468S_6 GS1  1 167
4  12419S_13 GS1  1 279        10   12468S_6 GS1  1 260
5  12419S_13 GS1  1 287        6   12468S_18 GS1  1 320
6  12468S_18 GS1  1 320        7   12468S_18 GS1  1 338
7  12468S_18 GS1  1 338        8   12468S_18 GS1  1 594
8  12468S_18 GS1  1 594        1     1201K_1 GS1  1 167 
9   12468S_6 GS1  1 167        2     1201K_1 GS1  1 355 
10  12468S_6 GS1  1 260        3     1201K_1 GS1  1 515

2 个答案:

答案 0 :(得分:0)

Building synchronization state...
Caught non-retryable exception while listing s3://musiclab-etl-dev/: AccessDeniedException: 403 InvalidAccessKeyId
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>ASIAJ3XGCQ7RGZYPD5UA</AWSAccessKeyId><RequestId>CE8919045C68DEC4</RequestId><HostId>i7oMBM61US3FyePJka8O+rjoHSo1rIZbRGnVZvIGkjEVPh6lXdbp03pZOtJ68F3pPdAAW1UvF5s=</HostId></Error>
CommandException: Caught non-retryable exception - aborting rsync

答案 1 :(得分:0)

根据提供的示例数据,OP希望更改数据集中Clone个组的顺序,而不是每个Clone组中行的顺序。

这可以通过将Clone转换为因子(如果它还不是一个因素)并使用fct_shuffle()包中的forcats函数来重新调整因子级别来实现:

dt <- readr::read_csv(
"Clone, V2, V3, V4
1201K_1, GS1, 1, 167
1201K_1, GS1, 1, 355
1201K_1, GS1, 1, 515
12419S_13, GS1, 1, 279
12419S_13, GS1, 1, 287
12468S_18, GS1, 1, 320
12468S_18, GS1, 1, 338
12468S_18, GS1, 1, 594 
12468S_6, GS1, 1, 167
12468S_6, GS1, 1, 260")

# order Clone alphabetically
dt[order(dt$Clone), ]
# A tibble: 10 x 4
       Clone    V2    V3    V4
      <fctr> <chr> <int> <int>
 1 12468S_18   GS1     1   320
 2 12468S_18   GS1     1   338
 3 12468S_18   GS1     1   594
 4   1201K_1   GS1     1   167
 5   1201K_1   GS1     1   355
 6   1201K_1   GS1     1   515
 7  12468S_6   GS1     1   167
 8  12468S_6   GS1     1   260
 9 12419S_13   GS1     1   279
10 12419S_13   GS1     1   287
# randomly permute factor levels
dt$Clone <- forcats::fct_shuffle(dt$Clone)
dt[order(dt$Clone), ]
# A tibble: 10 x 4
       Clone    V2    V3    V4
      <fctr> <chr> <int> <int>
 1   1201K_1   GS1     1   167
 2   1201K_1   GS1     1   355
 3   1201K_1   GS1     1   515
 4  12468S_6   GS1     1   167
 5  12468S_6   GS1     1   260
 6 12468S_18   GS1     1   320
 7 12468S_18   GS1     1   338
 8 12468S_18   GS1     1   594
 9 12419S_13   GS1     1   279
10 12419S_13   GS1     1   287
# repeat: randomly permute factor levels
dt$Clone <- forcats::fct_shuffle(dt$Clone)
dt[order(dt$Clone), ]
# A tibble: 10 x 4
       Clone    V2    V3    V4
      <fctr> <chr> <int> <int>
 1 12468S_18   GS1     1   320
 2 12468S_18   GS1     1   338
 3 12468S_18   GS1     1   594
 4  12468S_6   GS1     1   167
 5  12468S_6   GS1     1   260
 6   1201K_1   GS1     1   167
 7   1201K_1   GS1     1   355
 8   1201K_1   GS1     1   515
 9 12419S_13   GS1     1   279
10 12419S_13   GS1     1   287