随机删除特定列相同的两行

时间:2018-02-14 10:37:51

标签: r

我在整理样品时遇到问题。 在实验之前,我有一个数据框,

automaton.hs:25:48: error:
    • Couldn't match expected type ‘[Transaction]’
                  with actual type ‘Automaton -> [Transaction]’
    • Probable cause: ‘transactions’ is applied to too few arguments
      In the second argument of ‘(:)’, namely ‘transactions’
      In the ‘transactions’ field of a record
      In the expression: a {transactions = t : transactions}
   |
25 | insert_transaction t a = a{transactions =  t : transactions }
   |                                                ^^^^^^^^^^^^

实验结束后,我发现a和b之间没有区别。因此,我想随机删除列R,S,T中相同但没有O的行(O是每个样本的id,这个因素不会影响结果)。 我怎么能这样做?

R <- c("a","b","b","a")
S <- rep(c(25,37),,4)
T <-c(1:4,c(3,4,2,1))
O <- c(100:107)
my_data<- data.frame(R,S,T,O)

1 个答案:

答案 0 :(得分:0)

使用dplyr可以实现一个解决方案。方法是在ST上进行分组,然后从为组选择的行中选择1st或者一个。

编辑:: 根据@docendodiscimus建议使用sample_n()包含其他选项。

# Data 
R <- c("a","b","b","a")
S <- rep(c(25,37),,4)
T <-c(1:4,c(3,4,2,1))
O <- c(100:107)
my_data<- data.frame(R,S,T,O)


ibrary(dplyr)

my_data %>% group_by(S, T) %>% 
  filter(row_number() == 1)

 # OR It can be made random selection as

 my_data %>% group_by(S, T) %>% 
 filter(row_number() == sample(1:n(),1))


# OR Another option could be use sample_n()
  my_data %>% group_by(S, T) %>% sample_n(1)

# Result 

#  R          S     T     O
#  <fctr> <dbl> <dbl> <int>
#1 a       25.0  1.00   100
#2 b       37.0  2.00   101
#3 b       25.0  3.00   102
#4 a       37.0  4.00   103
#5 b       25.0  2.00   106
#6 a       37.0  1.00   107