我们假设我们的序列包含5个不同的事件/状态(A-E):
library(TraMineR)
data(actcal)
actcal.seq <- seqdef(actcal, 13:24, alphabet=c("A","B","C","D","E")
现在是否可以创建一个仅包含事件A,C和E的actcal.seq子集?如果是,那么这是怎么做到的?
澄清:我想提取包含A,C或E的任何序列。如果其中任何一个包含B或D,则应从返回的序列中删除这些事件。例如,序列A-A-B-C-C-D-E-E应作为A-A-C-C-E-E返回。
澄清2:输入序列应该使用alphabet=c("A","B","C","D","E")
,而我正在寻找的修改后的序列对象应该使用alphabet=c("A","C","E")
。下面给出了一些更多的例子:
"A-B-C-D-E" => "A-C-E"
"A-C-A-E" => "A-C-A-E"
"B-D" => NA or ""
"B-D-B-A-D" => "A"
我会感谢任何有关如何解决此问题的解决方案,而无需重新读取数据库中的数据子集。
答案 0 :(得分:1)
您可以通过#join together and sorts
df = pd.concat([df, df1]).sort_index(level=[0,1])
print (df)
SCENARIO STATUS TYPE
AAA 51 1 2
9 1
Total 3
53 228 1
Total 1
BBB 51 43 1
Total 1
CCC 51 187 1
Total 1
Name: TYPE, dtype: int64
功能将状态B和D重新编码为缺失状态。用于丢失的默认符号是seqrecode
。我仅使用*
actcal
删除丢失的状态
data(actcal)
actcal.seq <- seqdef(actcal[1:10,13:24], alphabet=c("A","B","C","D","E"))
## Recode B and D as *, the default missing symbol
actcal.rec.seq <- seqrecode(actcal.seq,
recodes = list("*"=c("B","D")), otherwise=NULL)
actcal.seq
# Sequence
# 2848 B-B-B-B-B-B-B-B-B-B-B-B
# 1230 D-D-D-D-A-A-A-A-A-A-A-D
# 2468 B-B-B-B-B-B-B-B-B-B-B-B
# 654 C-C-C-C-C-C-C-C-C-B-B-B
# 6946 A-A-A-A-A-A-A-A-A-A-A-A
# 1872 D-B-B-B-B-B-B-B-B-B-B-B
# 2905 D-D-D-D-D-D-D-D-D-D-D-D
# 106 A-A-A-A-A-A-A-A-A-A-A-A
# 5113 A-A-A-A-A-A-A-A-A-A-A-A
# 4503 A-A-A-A-A-A-A-A-A-A-A-A
actcal.rec.seq
# Sequence
# 2848 *-*-*-*-*-*-*-*-*-*-*-*
# 1230 *-*-*-*-A-A-A-A-A-A-A-*
# 2468 *-*-*-*-*-*-*-*-*-*-*-*
# 654 C-C-C-C-C-C-C-C-C-*-*-*
# 6946 A-A-A-A-A-A-A-A-A-A-A-A
# 1872 *-*-*-*-*-*-*-*-*-*-*-*
# 2905 *-*-*-*-*-*-*-*-*-*-*-*
# 106 A-A-A-A-A-A-A-A-A-A-A-A
# 5113 A-A-A-A-A-A-A-A-A-A-A-A
# 4503 A-A-A-A-A-A-A-A-A-A-A-A
删除仅包含缺失
的序列actcal.rec.comp.seq <- seqdef(actcal.rec.seq,
left="DEL", gap="DEL", right="DEL",
missing="*", alphabet=c("A","C","E"))
如果你只想要不同连续状态的序列
(rec.seq <- actcal.rec.comp.seq[!is.na(seqdur(actcal.rec.comp.seq)[,1]),])
# Sequence
# 2103 A-A-A-A-A-A-A-A-A-A-A-A
# 3972 C-C-C-C-C-C-C-C-C
# 5238 C
# 4977 C-C-C-C-C-C-C-C-C-C-C-C
# 528 A-A-A-A-A-A-A-A-A-A-A-A