我有一个包含8列和许多行的数据框。我想在第6列和第7列中删除包含多个字符串的行,并在第6列和第7列中输出仅包含一个字符串的数据框
DF:
ID Content_ID Chromosome Start Stop Reference Alternate Length
1299675221 backbone 12 99675221 99675221 GG T 0
1298583685 backbone 12 98583685 98583685 C T 0
129833474 backbone 12 9833474 9833474 C T 0
1297722695 backbone 12 97722695 97722695 A G 0
1297381269 backbone 12 97381269 97381269 T C 0
1297081605 backbone 12 97081605 97081605 G AA 0
1297058068 backbone 12 97058068 97058068 T C 0
1295891848 backbone 12 95891848 95891848 CCTT ATA 0
1294164312 backbone 12 94164312 94164312 T C 0
12940191 backbone 12 940191 940191 T C 0
期望的输出:
ID Content_ID Chromosome Start Stop Reference Alternate Length
1298583685 backbone 12 98583685 98583685 C T 0
129833474 backbone 12 9833474 9833474 C T 0
1297722695 backbone 12 97722695 97722695 A G 0
1297381269 backbone 12 97381269 97381269 T C 0
1297058068 backbone 12 97058068 97058068 T C 0
1294164312 backbone 12 94164312 94164312 T C 0
12940191 backbone 12 940191 940191 T C 0
答案 0 :(得分:3)
我们可以使用lapply
遍历第6列和第7列,检查字符数是否为1,使用Reduce
和&
来获取逻辑vector
比较list
的相应元素,使用它来对'df'
df[Reduce(`&`, lapply(df[6:7], function(x) nchar(x)==1)),]
# ID Content_ID Chromosome Start Stop Reference Alternate Length
#2 1298583685 backbone 12 98583685 98583685 C T 0
#3 129833474 backbone 12 9833474 9833474 C T 0
#4 1297722695 backbone 12 97722695 97722695 A G 0
#5 1297381269 backbone 12 97381269 97381269 T C 0
#7 1297058068 backbone 12 97058068 97058068 T C 0
#9 1294164312 backbone 12 94164312 94164312 T C 0
#10 12940191 backbone 12 940191 940191 T C 0
或另一个选项是rowSums
df[!rowSums(nchar(as.matrix(df[6:7]))!=1),]
答案 1 :(得分:2)
同样,您可以将列粘贴在一起,然后保留字符数等于3的行,每列一个空格和一个空格。
df[nchar(paste(df$Reference, df$Alternate)) == 3,]
ID Content_ID Chromosome Start Stop Reference Alternate Length
2 1298583685 backbone 12 98583685 98583685 C T 0
3 129833474 backbone 12 9833474 9833474 C T 0
4 1297722695 backbone 12 97722695 97722695 A G 0
5 1297381269 backbone 12 97381269 97381269 T C 0
7 1297058068 backbone 12 97058068 97058068 T C 0
9 1294164312 backbone 12 94164312 94164312 T C 0
10 12940191 backbone 12 940191 940191 T C 0
答案 2 :(得分:1)
使用data.table
library(data.table)
setDT(df)
df <- df[ nchar(Reference)==1 & nchar(Alternate)==1]