根据条件对data.frame进行分区

时间:2015-06-23 17:00:37

标签: r partitioning

我有一个data.frame形状如下:

c <- data.frame(name=c("a", "a", "b", "b", "c", "c","d","d"), value=c(1,3,2,4,5,3,4,5), address=c("rrrr","rrrr","zzzz","aaaa","ssss","jjjj","qqqq","qqqq"))
> c
  name value address
1    a     1    rrrr
2    a     3    rrrr
3    b     2    zzzz 
4    b     4    aaaa
5    c     5    ssss
6    c     3    jjjj
7    d     4    qqqq
8    d     5    qqqq 

我试图根据一个简单的规则将这个数据框分成两个独立的数据框:将没有改变地址的人聚集在一起,并将改变了地址的人聚在一起。有关如何完成任务的任何提示吗?

到目前为止,我正在玩,但无济于事:

for(i in seq(1,8, by=2)){
    print(i)
    print(unlist(c[which(c[i,3]==c[(i+1),3]),]))    
}

3 个答案:

答案 0 :(得分:2)

这会在此基础上计算地址和拆分的数量。有一个障碍可以克服,它与<NA> ave始终使用as.character直到使用Warning messages: 1: In `[<-.factor`(`*tmp*`, i, value = c(1L, 1L)) : 有关。有一条警告信息,我正在复制它的开头,所以搜索者可能会找到这个:

cc

成功版本(使用名为 split(cc, ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) ) $`1` name value address 1 a 1 rrrr 2 a 3 rrrr 7 d 4 qqqq 8 d 5 qqqq $`2` name value address 3 b 2 zzzz 4 b 4 aaaa 5 c 5 ssss 6 c 3 jjjj 的数据对象):

> 1

如果你真的想要一个双分裂,那么用 split(cc, ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) >1) $`FALSE` name value address 1 a 1 rrrr 2 a 3 rrrr 7 d 4 qqqq 8 d 5 qqqq $`TRUE` name value address 3 b 2 zzzz 4 b 4 aaaa 5 c 5 ssss 6 c 3 jjjj 转换为逻辑:

str(dat)

我不明白这个评论。这就是我得到的List of 2 $ FALSE:'data.frame': 4 obs. of 3 variables: ..$ name : Factor w/ 4 levels "a","b","c","d": 1 1 4 4 ..$ value : num [1:4] 1 3 4 5 ..$ address: Factor w/ 6 levels "aaaa","jjjj",..: 4 4 3 3 $ TRUE :'data.frame': 4 obs. of 3 variables: ..$ name : Factor w/ 4 levels "a","b","c","d": 2 2 3 3 ..$ value : num [1:4] 2 4 5 3 ..$ address: Factor w/ 6 levels "aaaa","jjjj",..: 6 1 5 2

def get_move_order():  # Asks for a move order until a valid list of moves was entered
    while True:
        move_order = [q for q in raw_input("Enter your move order: ")]

        print "Checking the validity of your move..."
        if check_correct_moves_only(move_order):
            break  # breaks out of the while loop
        else:
            print "That's not a proper move!"
    # valid move has been entered. Start the game.
    start(move_order)

def check_correct_moves_only(move_order):
    moves = ['A', 'D', 'S', 'C', 'H']
    for q in move_order:
        if q not in moves:
            return False
    return True

答案 1 :(得分:1)

使用dplyr

library(dplyr)
z<-c %>% group_by(name) %>% 
         mutate(changed = n_distinct(address))
split(z, z$changed)

感谢@akrun提醒我n_distinct

答案 2 :(得分:0)

@ jeremycg的答案很棒,我正在尝试学习dplyr,但这里也是非dplyr版本。

numAddresses <- sapply(split(c, c$name), function(x)
    length(unique(x$address)))
split(c, numAddresses[c$address])