我有一个数据框pedigrees
的样本排列在家庭中:
pedigrees %>%
filter(Family %in% sample(pedigrees$Family, 5)
Family_ID Sample_ID fatherID motherID sex status
<chr> <chr> <chr> <chr> <int> <int>
1 MtS.MIPS.61 UCSF_AGG0092_8005439845 0 0 2 0
2 MtS.MIPS.61 UCSF_AGG0093_8005439857 0 0 1 0
3 MtS.MIPS.61 UCSF_AGG0094_8005439869 AGG0093 AGG0092 2 0
4 MtS.MIPS.61 UCSF_AGG0095_8005439881 AGG0093 AGG0092 2 2
5 MtS.MIPS.61 UCSF_AGG0091_8005439928 AGG0093 AGG0092 1 2
6 FAM048 UCSF_G01-GEA-259-HI_8005440194 G01-GEA-259-PA G01-GEA-259-MA 1 2
7 FAM048 UCSF_G01-GEA-259-MA_8005440206 0 0 2 0
8 FAM048 UCSF_G01-GEA-259-PA_8005440218 0 0 1 0
9 F1543 UCSF_F1543-1_8005116638 F1543-3 F1543-2 2 2
10 F1543 UCSF_F1543-2_8005116649 0 0 2 0
11 F1543 UCSF_F1543-3_8005116661 0 0 1 0
12 AU0045 UCSF_AU0045201_04C32032A 0 0 1 0
13 AU0045 UCSF_AU0045202_04C32033A 0 0 2 0
14 AU0045 UCSF_AU0045301_04C32034A AU0045201 AU0045202 2 2
15 AU0045 UCSF_AU0045302_04C32035A AU0045201 AU0045202 1 2
16 1232 UCSF_1232002_8004805191 1232011 1232012 2 2
17 1232 UCSF_1232011_8004805203 0 0 1 1
18 1232 UCSF_1232012_8004805215 0 0 2 1
列Sample_ID
的格式是列fatherID
和motherID
也应该具有的格式,例如,最后一个家庭1232
实际上看起来像这样:
16 1232 UCSF_1232002_8004805191 UCSF_1232011_8004805203 UCSF_1232012_8004805215 2 2
17 1232 UCSF_1232011_8004805203 0 0 1 1
18 1232 UCSF_1232012_8004805215 0 0 2 1
我知道我应该使用str_match
或grep
,但我如何在pedigree
的所有样本中应用此内容?
答案 0 :(得分:2)
如果我理解正确的话。您可以使用group_by
执行dplyr
,然后根据mutate
内是否等于0来替换fatherID和motherID。我使用grepl
来查找与当前母亲/父亲ID匹配的Sample_ID。
library(dplyr)
pedigree %>%
group_by(Family_ID) %>%
mutate(motherID = ifelse(motherID != "0",
Sample_ID[grepl(motherID[motherID != "0"][1], Sample_ID)],
"0"),
fatherID = ifelse(fatherID != "0",
Sample_ID[grepl(fatherID[fatherID != "0"][1], Sample_ID)],
"0")
)
# A tibble: 18 x 7
# Groups: Family_ID [5]
# r Family_ID Sample_ID fatherID motherID sex status
# <int> <fct> <chr> <chr> <chr> <int> <int>
# 1 1 MtS.MIPS.61 UCSF_AGG0092_8005439845 0 0 2 0
# 2 2 MtS.MIPS.61 UCSF_AGG0093_8005439857 0 0 1 0
# 3 3 MtS.MIPS.61 UCSF_AGG0094_8005439869 UCSF_AGG0093_8005439857 UCSF_AGG0092_8005439845 2 0
# 4 4 MtS.MIPS.61 UCSF_AGG0095_8005439881 UCSF_AGG0093_8005439857 UCSF_AGG0092_8005439845 2 2
# 5 5 MtS.MIPS.61 UCSF_AGG0091_8005439928 UCSF_AGG0093_8005439857 UCSF_AGG0092_8005439845 1 2
# 6 6 FAM048 UCSF_G01-GEA-259-HI_8005440194 UCSF_G01-GEA-259-PA_8005440218 UCSF_G01-GEA-259-MA_8005~ 1 2
# 7 7 FAM048 UCSF_G01-GEA-259-MA_8005440206 0 0 2 0
# 8 8 FAM048 UCSF_G01-GEA-259-PA_8005440218 0 0 1 0
# 9 9 F1543 UCSF_F1543-1_8005116638 UCSF_F1543-3_8005116661 UCSF_F1543-2_8005116649 2 2
#10 10 F1543 UCSF_F1543-2_8005116649 0 0 2 0
#11 11 F1543 UCSF_F1543-3_8005116661 0 0 1 0
#12 12 AU0045 UCSF_AU0045201_04C32032A 0 0 1 0
#13 13 AU0045 UCSF_AU0045202_04C32033A 0 0 2 0
#14 14 AU0045 UCSF_AU0045301_04C32034A UCSF_AU0045201_04C32032A UCSF_AU0045202_04C32033A 2 2
#15 15 AU0045 UCSF_AU0045302_04C32035A UCSF_AU0045201_04C32032A UCSF_AU0045202_04C32033A 1 2
#16 16 1232 UCSF_1232002_8004805191 UCSF_1232011_8004805203 UCSF_1232012_8004805215 2 2
#17 17 1232 UCSF_1232011_8004805203 0 0 1 1
#18 18 1232 UCSF_1232012_8004805215 0 0 2 1