计算多行和列模式的出现次数

时间:2016-04-02 20:33:36

标签: r

我想要一个特定的语音模式。第1列中有演员,第2列中有句子类型。我希望以编程方式识别名为IRF/IRE的会话模式。模式是:

  • I - >老师问了一个问题
  • R - >学生回应(通常有答案)
  • F / E - >教师评估或提供学生答案的反馈

所以我在第1栏中寻找对应于? - [。!] - [?。!]的师生。

因此,在下面的假数据中,以下行符合此模式并计算:

33  Teacher    ?
34  Student    .
35  Teacher    .

我这样做是在视觉上检查数据。我怎么能找到基本上是以下矩阵模式的东西:

| Teacher    |    ?  |    
| Student    | [.!]  |    
| Teacher    | [?!.] | 

如果它让事情更快/更容易,我对任何外部包都开放。

n <- 100
set.seed(10)
dat <- data.frame(
    actor = sample(c("Teacher", "Student"), n, TRUE, c(.6, .4)),
    type = c(sample(c('?', '.', '!'), n, TRUE, c(.3, .5, .1)))
)

head(dat)

##     actor type
## 1 Teacher    .
## 2 Teacher    .
## 3 Teacher    .
## 4 Student    .
## 5 Teacher    !
## 6 Teacher    ?
## . 
## . 
## . 

2 个答案:

答案 0 :(得分:1)

这是一种仅使用基本R索引,比较和逻辑运算的方法:

hits <- which(
      dat$actor[-seq(nrow(dat),by=-1L,len=2L)]=='Teacher'
    & dat$type [-seq(nrow(dat),by=-1L,len=2L)]=='?'
    & dat$actor[-c(1L,nrow(dat))]=='Student'
    & dat$type [-c(1L,nrow(dat))]%in%c('.','!')
    & dat$actor[-1:-2]=='Teacher'
    & dat$type [-1:-2]%in%c('?','!','.')
);
hits;
## [1] 33 51 95
dat[rep(hits,each=3L)+0:2,];
##      actor type
## 33 Teacher    ?
## 34 Student    .
## 35 Teacher    .
## 51 Teacher    ?
## 52 Student    .
## 53 Teacher    .
## 95 Teacher    ?
## 96 Student    .
## 97 Teacher    ?

我将解决方案概括为参数化比较运算符作为函数列表,操作数作为列表列的data.frame,其列名标识目标列:

dfmatch <- function(df,operands,preds=rep(list(`%in%`),length(operands))) {
    preds <- as.list(preds);
    operands <- as.data.frame(operands);
    if (length(preds)!=ncol(operands)) stop('length(preds)!=ncol(operands).');
    predLen <- length(preds);
    rowLen <- nrow(operands);
    if (rowLen>nrow(df)) return(integer());
    which(Reduce(`&`,lapply(seq_len(predLen),function(opi) {
        pred <- preds[[opi]];
        Reduce(`&`,lapply(seq_len(rowLen),function(ri) {
            operand <- operands[[opi]][[ri]];
            pred(df[[names(operands[opi])]][-c(seq(1L,len=ri-1L),seq(nrow(df),by=-1L,len=rowLen-ri))],operand);
        }));
    })));
}; ## end dfmatch()

operands <- data.frame(actor=I(list('Teacher','Student','Teacher')),type=I(list('?',c('.','!'),c('?','!','.'))));
operands;
##     actor    type
## 1 Teacher       ?
## 2 Student    ., !
## 3 Teacher ?, !, .
dfmatch(dat,operands);
## [1] 33 51 95

答案 1 :(得分:0)

这是一种方法:

if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, zoo)

dat2 <- dat %>%
    mutate(
        combo = paste0(actor, type)

    )


patterns <- expand.grid(
    paste0('Teacher', '?'),    
    paste0('Student', c('.', '!')),  
    paste0('Teacher', c('.', '!', '?'))   
)

locs <- apply(patterns, 1, function(x){
    with(dat2, which(rollapply(combo, 3, identical, unname(unlist(x, use.names=FALSE)))))
})

lapply(unlist(locs[sapply(locs, length) > 0]), function(i) {
    dat2[i:(i+2),]
})

## [[1]]
##      actor type    combo
## 33 Teacher    ? Teacher?
## 34 Student    . Student.
## 35 Teacher    . Teacher.
## 
## [[2]]
##      actor type    combo
## 51 Teacher    ? Teacher?
## 52 Student    . Student.
## 53 Teacher    . Teacher.
## 
## [[3]]
##      actor type    combo
## 95 Teacher    ? Teacher?
## 96 Student    . Student.
## 97 Teacher    ? Teacher?

length(unlist(locs[sapply(locs, length) > 0]))

## 3