Question

我有包含字符A，B，C的“ Y迷宫”序列数据。我正在尝试量化这三个值一起被发现的次数。数据如下：

Animal=c(1,2,3,4,5)
VisitedZones=c(1,2,3,4,5)
data=data.frame(Animal, VisitedZones)
data[1,2]=("A,C,B,A,C,A,B,A,C,A,C,A,C,B,B,C,A,C,C,C")
data[2,2]=("A,C,B,A,C,A,B,A,C,A,C,A,C,B")
data[3,2]=("A,C,B,A,C,A,B,A,C,A")
data[4,2]=("A,C,B,A,C,A,A,A,B,A,C,A,C,A,C,B")
data[5,2]=("A,C,B,A,C,A,A,A,B,")

棘手的是，我还必须考虑阅读框架，以便可以找到ABC组合的所有实例。有三个阅读框，例如：

这是我到目前为止的工作示例。

Split <- strsplit(data$VisitedZones, ",", fixed = TRUE)
## How long is each list element?
Ncol <- vapply(Split, length, 1L)
## Create an empty character matrix to store the results
M <- matrix(NA_character_, nrow = nrow(data),ncol = max(Ncol),
        dimnames = list(NULL, paste0("V", sequence(max(Ncol)))))
## Use matrix indexing to figure out where to put the results
M[cbind(rep(1:nrow(data), Ncol),sequence(Ncol))] <- unlist(Split, 
         use.names = FALSE)
# Bind the values back together, here as a "data.table" (faster)
v2=data.table(Animal = data$Animal, M)
# I get error here
df=mutate(as.data.frame(v2),trio=paste0(v2,lead(v2),lead(v2,2)))
table(df$trio[1:(length(v2)-2)])

如果我能得到这样的东西，那就太好了

Animal   VisitedZones   ABC  ACB  BCA  BAC  CAB  CBA
  1      A,B,C,A,B.C...  2    0    1    0    1    0
  2      A,B,C,C...      1    0    0    0    0    0
  3      A,C,B,A...      0    1    0    0    0    1

Answer 1

您修改后的问题基本上是完全不同的，所以我在这里回答。

首先，我想说您的数据结构对我来说意义不大，所以我首先将其重塑为可以使用的内容：

v2<-as.data.frame(t(v2))

将其翻转过来，以便字母位于列而不是行中；

v2<-tidyr::gather(v2,"v","letter",na.rm=T)

融合表，以便它的数据很长（这样我就可以使用Lead等）。

v2<-group_by(v2,v)
df=mutate(v2,trio=paste0(letter,lead(letter),lead(letter,2)))

这基本上使我们回到上一个问题末尾的位置，只有数据通过“动物”变量（此处称为“ v”，并由V1到V5表示）分组。

df<-df[!grepl("NA",df$trio),]

即使我们删除了不必要的NA，我们仍然会在每个组的末尾保留那些讨厌的ABNA和ANANA等，因此此行将删除其中包含NA的所有内容。

tt<-table(df$v,df$trio)

最后，我们创建表，但也用“ v”将其断开。结果是这样的：

     AAA AAB ABA ACA ACB ACC BAC BBC BCA CAA CAB CAC CBA CBB CCC
  V1   0   0   1   3   2   1   2   1   1   0   1   3   1   1   1
  V2   0   0   1   3   2   0   2   0   0   0   1   2   1   0   0
  V3   0   0   1   2   1   0   2   0   0   0   1   0   1   0   0
  V4   1   1   1   3   2   0   2   0   0   1   0   2   1   0   0
  V5   1   1   0   1   1   0   1   0   0   1   0   0   1   0   0

您现在可以将其绑定到原始的data以获得您所描述的内容，但是由于table保存结果的方式，它只需要一个附加步骤：

data<-cbind(data,spread(as.data.frame(tt),Var2,Freq))[,-3]

最终看起来像这样：

  Animal                            VisitedZones AAA AAB ABA ACA ACB ACC BAC BBC BCA CAA CAB CAC CBA CBB CCC
1      1 A,C,B,A,C,A,B,A,C,A,C,A,C,B,B,C,A,C,C,C   0   0   1   3   2   1   2   1   1   0   1   3   1   1   1
2      2             A,C,B,A,C,A,B,A,C,A,C,A,C,B   0   0   1   3   2   0   2   0   0   0   1   2   1   0   0
3      3                     A,C,B,A,C,A,B,A,C,A   0   0   1   2   1   0   2   0   0   0   1   0   1   0   0
4      4         A,C,B,A,C,A,A,A,B,A,C,A,C,A,C,B   1   1   1   3   2   0   2   0   0   1   0   2   1   0   0
5      5                      A,C,B,A,C,A,A,A,B,   1   1   0   1   1   0   1   0   0   1   0   0   1   0   0

通过移动阅读框比较向量值

1 个答案: