替换值并使用if / for循环添加行

时间:2016-10-11 11:21:14

标签: r bioinformatics genetics

如果Sire不是Id,我希望将Dam替换为新的0。然后,每次使用新的IdSex添加新行。

例如,我需要将第一行中的0替换为s1073,并将数据中的新行添加为1 s1073 0 0 2。同样,如果Dam为0且Sir不是0,则在数据集中添加新行(例如第7行)需要使用{{1}重新Dam 0并在数据框中添加新行d900

任何人都可以帮我解决这个问题吗?

1 d900 0 0 2

2 个答案:

答案 0 :(得分:2)

我猜这是一种plink FAM格式,有些人错过了父亲或母亲,我们希望为至少有一位父母的人添加失踪的父母,如果两人失踪那么不要加父母。

# dummy fam data with missing parents
df1 <- read.table(text = "FID   IID Father  Mother  Sex
1   1   0   2   1
                  1 2   0   0   2
                  1 3   0   2   1
                  1 4   0   2   2
                  2 1   3   0   1
                  2 2   3   0   2
                  2 3   0   0   1
                  3 1   0   0   1
                  4 1   0   0   1
                  4 2   0   0   2
                  4 3   1   2   2
                  4 4   1   2   2
                  ", header = TRUE, 
                  colClasses = "character")

注意,关于虚拟数据:
   - FID == 1缺少父亲    - FID == 2缺少母亲    - FID == 3是一个没有父母的单个家庭    - FID == 4不缺父母

任务,只有在其中一个缺失的情况下才会添加丢失的父亲或母亲。即:如果两个都缺少父亲== 0和母亲== 0,那么就不要添加父母。

library(dplyr) # using dplyr for explicity of steps.

# update 0 to IID for missing Father and Mother with suffix f and m
df1 <- 
  df1 %>% 
  mutate(
    FatherNew = if_else(Father == "0" & Mother != "0", paste0(Mother, "f", IID), Father),
    MotherNew = if_else(Mother == "0" & Father != "0", paste0(Father, "m", IID), Mother))

# add missing Fathers
missingFather <- df1 %>% 
  filter(
    FatherNew != "0" &
      MotherNew != "0" &
      !FatherNew %in% df1$IID) %>% 
  transmute(
    FID = FID,
    IID = FatherNew,
    Father = "0",
    Mother = "0",
    Sex = "1") %>%
  unique


# add missing Mothers
missingMother <- df1 %>% 
  filter(
    FatherNew != "0" &
      MotherNew != "0" &
      !MotherNew %in% df1$IID) %>% 
  transmute(
    FID = FID,
    IID = MotherNew,
    Father = "0",
    Mother = "0",
    Sex = "2") %>%
  unique

# update new Father/Mother IDs
res <- df1 %>% 
  transmute(
    FID = FID,
    IID = IID,
    Father = FatherNew,
    Mother = MotherNew,
    Sex = Sex)

# add missing Fathers/Mothers as new rows, and sort
res <- rbind(
  res,
  missingFather,
  missingMother) %>%
  arrange(FID, IID)

结果,检查输出

res
#    FID IID Father Mother Sex
# 1    1   1    2f1      2   1
# 2    1   2      0      0   2
# 3    1 2f1      0      0   1
# 4    1 2f3      0      0   1
# 5    1 2f4      0      0   1
# 6    1   3    2f3      2   1
# 7    1   4    2f4      2   2
# 8    2   1      3    3m1   1
# 9    2   2      3    3m2   2
# 10   2   3      0      0   1
# 11   2 3m1      0      0   2
# 12   2 3m2      0      0   2
# 13   3   1      0      0   1
# 14   4   1      0      0   1
# 15   4   2      0      0   2
# 16   4   3      1      2   2
# 17   4   4      1      2   2

答案 1 :(得分:-1)

我认为这个答案对我来说非常有用,可以找出失踪的春天。谢谢!