如何创建一个组合R中前一行的某些值的行?

时间:2017-09-07 21:47:44

标签: r dataframe

在我的数据中,我有一些代表重复测试结果的行。重复中仅捕获某些值。我想要做的是使用重复值创建一个新行,但如果重复值为NA或空白,则从初始测试中拉出。

E.g。因为,

Patient ID   Initial/Repeat   Value    Value 2   Accept/Reject
A1                   Initial      95        NA          Reject
A1                    Repeat      NA        80          Accept    
A2                   Initial      80        70          Accept

我想转变成:

Patient ID   Initial/Repeat   Value    Value 2   Accept/Reject
A1                    Repeat      95        80          Accept    
A2                   Initial      80        70          Accept

谢谢。

4 个答案:

答案 0 :(得分:2)

试试这个:

require(zoo)
require(dplyr)

df %>%
  group_by(Patient_ID) %>%
  mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE))) %>% 
  filter(row_number()==n())

输出:

# A tibble: 2 x 5
# Groups:   Patient_ID [2]
  Patient_ID Initial_Repeat Value Value2 Accept_Reject
       <chr>          <chr> <int>  <int>         <chr>
1         A1         Repeat    95     80        Accept
2         A2        Initial    80     70        Accept

答案 1 :(得分:1)

它总是一系列具有单一有效值的NA吗?如果是,你可以采取行的平均值,扔掉任何NA。我使用dplyr的分组和汇总功能执行此操作:

# Sample data:
df = read.table(text="PatientID   Initial_Repeat   Value    Value2   Accept_Reject
A1                   Initial      95        NA          Reject
A1                    Repeat      NA        80          Accept    
A2                   Initial      80        70          Accept", header = TRUE)

# My solution uses the dplyr package:
library(dplyr)
answer = df %>% 
     group_by(PatientID) %>% 
     summarise(Value = mean(Value, na.rm = TRUE), Value2 = mean(Value2, na.rm = TRUE))

答案:

# A tibble: 2 x 3
  PatientID Value Value2
     <fctr> <dbl>  <dbl>
1        A1    95     80
2        A2    80     70
  

答案 2 :(得分:1)

没有额外的库:

df1 <- with(df, data.frame(PatientID=tapply(PatientID, PatientID, 
    function(x) x[length(x)])))
df1$Inital_Repeat <- with(df, tapply(Initial_Repeat, PatientID, 
    function(x) levels(Initial_Repeat)[x[length(x)]]))
for (v in c('Value', 'Value2')) 
    df1[[v]] <- tapply(df[[v]], df$PatientID, function(x) x[!is.na(x)][1])
df1$Accept_Reject <- with(df, tapply(Accept_Reject, PatientID,
    function(x) levels(Accept_Reject)[x[length(x)]]))

输出:

   PatientID Inital_Repeat Value Value2 Accept_Reject
A1         1        Repeat    95     80        Accept
A2         2       Initial    80     70        Accept

请注意,Inital_RepeatAccept_Rejectfactor s。

编辑:PatientID也是factor,这就是为1提供2PatientID的原因。要拥有“A1”和“A2”,请将第2行的x[length(x)]更改为levels(x)[x[length(x)]]。另外,第4行的levels(Initial_Repeat)可以替换为levels(x),第8行也可以替换levels(Accept_Reject)

答案 3 :(得分:0)

我还发现tidyverse内的工具也能完成这项工作。它比zoo慢一点,但提供了更好的可读性,并且需要更少的软件包加载。

library(tidyverse)

df <- df %>%
  group_by(Patient_ID) %>%
  fill(names(df), .direction = "down") %>%
  filter(row_number() == n())