合并带有merge()和空格的数据框-格式问题

时间:2018-08-10 15:32:40

标签: r merge

我想将信息从一个数据帧添加到另一个数据,从dfadddfmaster,同时将行的顺序保持在dfmaster中。

我尝试使用merge(),但是它更改了dfmaster中行的顺序。订单是这里的关键。是否有data.table()或tidyverse()处理方式?

谢谢!

# Data
dfmaster <- data.frame(variable_name=c("Blood_sugar","","","Blood_pressure","","Pulse",""),variable_level=c("high","medium","low","high","low","high","low"),variable_defin=c("baseline, lab","","","baseline, measured","","baseline, measured",""))
dfadd <- data.frame(variable_name=c("Blood_sugar","Blood_pressure","Pulse","Breakfast","Rest"),centre_names1=c("ST","FD","","QW",""),centre_names2=c("","HF","","",""),centre_names3=c("","LD","","",""),one_or_more=c("one","more","","one",""))


# Goal 
dfgoal <- data.frame(variable_name=c("Blood_sugar","","","Blood_pressure","","Pulse",""),variable_level=c("high","medium","low","high","low","high","low"),variable_defin=c("baseline, lab","","","baseline, measured","","baseline, measured",""),centre_names1=c("ST","","","FD","","",""),centre_names2=c("","","","FD","","",""),centre_names3=c("","","","LD","","",""),one_or_more=c("more","","","more","","",""))


# Attempt 
dfmaster <- merge(dfmaster,dfadd,by="variable_name", all.x=T)

1 个答案:

答案 0 :(得分:0)

您在dfgoal上的Blood_sugar ...中似乎有一个小错误,“ one_or_more”应为“ one”(根据dfadd),而不是“ more “。

请检查下面的代码是否是您的答案。

library(dplyr)
dfmaster %>% 
  #perform left join
  left_join(dfadd) %>% 
  #tidy all the emtpy factors to NA (remove if not desired)
  mutate_if(is.factor, funs(factor(replace(., .=="", NA))))

#    variable_name variable_level     variable_defin centre_names1 centre_names2 centre_names3 one_or_more
# 1    Blood_sugar           high      baseline, lab            ST          <NA>          <NA>         one
# 2                        medium               <NA>          <NA>          <NA>          <NA>        <NA>
# 3                           low               <NA>          <NA>          <NA>          <NA>        <NA>
# 4 Blood_pressure           high baseline, measured            FD            HF            LD        more
# 5                           low               <NA>          <NA>          <NA>          <NA>        <NA>
# 6          Pulse           high baseline, measured          <NA>          <NA>          <NA>        <NA>
# 7                           low               <NA>          <NA>          <NA>          <NA>        <NA>