如果下一行中另一列的值为空,则连接行

时间:2019-01-31 20:57:45

标签: r concatenation

我有一个数据集,如下表Input所示。我想合并(4,5,6)表的第(8,9)行,第(11,12)行和第Input行,以便它们共享第{{1}行中所示的相同ID },在下面的4,8 and 11表中。

我尝试了Output,但是没有按预期进行。此处的键是merge()列,该列具有唯一的值。

关于如何有效实现这一目标的任何建议?

输入

ID
输入的

dput()

Row Name Val1 Val2 Unit ID
1        -0.5 5.5   V   UI-001
2    a   -0.5 2.5   V   UI-002
3    b   -0.5 5.5   V   UI-003
4    c   -0.5 5.5   V   UI-004
5    d              
6    e              
7        -45 125  Ohms  UI-005
8    f     2        kV  UI-006
9    g              
10   h   500        V   UI-007
11   i    15        kV  UI-008
12   j              
13   k                  UI-009

输出

structure(list(Name = c(NA, "a", "b", "c", "d", "e", NA, "f", 
"g", "h", "i", "j", "k"), Val1 = c(-0.5, -0.5, -0.5, -0.5, NA, 
NA, -45, 2, NA, 500, 15, NA, NA), Val2 = c(5.5, 2.5, 5.5, 5.5, 
NA, NA, 125, NA, NA, NA, NA, NA, NA), Unit = c("V", "V", "V", 
"V", NA, NA, "Ohms", "kV", NA, "V", "kV", NA, NA), ID = c("UI-001", 
"UI-002", "UI-003", "UI-004", NA, NA, "UI-005", "UI-006", NA, 
"UI-007", "UI-008", NA, "UI-009")), row.names = c(NA, -13L), class = 
c("tbl_df", "tbl", "data.frame"))

dput()输出

Row Name Val1 Val2 Unit ID
1        -0.5 5.5   V   UI-001
2    a   -0.5 2.5   V   UI-002
3    b   -0.5 5.5   V   UI-003
4    cde -0.5 5.5   V   UI-004      
5        -45  125 Ohms  UI-005
6    fg    2        kV  UI-006  
7    h   500        V   UI-007
8    ij   15        kV  UI-008
9    k                  UI-009

2 个答案:

答案 0 :(得分:2)

我们可能会使用

out <- df[!is.na(df$ID), ]
out$Name[!is.na(out$Name)] <- tapply(df$Name, cumsum(!is.na(df$ID)), paste, collapse = "")[!is.na(out$Name)]
out
#    Name  Val1  Val2 Unit     ID
# 1  <NA>  -0.5   5.5    V UI-001
# 2     a  -0.5   2.5    V UI-002
# 3     b  -0.5   5.5    V UI-003
# 4   cde  -0.5   5.5    V UI-004
# 7  <NA> -45.0 125.0 Ohms UI-005
# 8    fg   2.0    NA   kV UI-006
# 10    h 500.0    NA    V UI-007
# 11   ij  15.0    NA   kV UI-008
# 13    k    NA    NA <NA> UI-009

第一行将删除IDNA的所有行。然后

tapply(df$Name, cumsum(!is.na(df$ID)), paste, collapse = "")
#     1     2     3     4     5     6     7     8     9 
#  "NA"   "a"   "b" "cde"  "NA"  "fg"   "h"  "ij"   "k" 

Name构造正确的值,并且!is.na(out$Name)为我们提供了out的哪些行应进行修改(因为"NA"与{{ 1}})。

答案 1 :(得分:0)

还有dplyr的可能性:

df %>%
 mutate(grp = ifelse((is.na(lead(ID, default = last(ID))) & !is.na(ID)) | is.na(ID), 1, 0),
        grp = ifelse(grp != 0, cumsum(grp != lag(grp, 1, default = first(grp))), 0)) %>%
 group_by(grp) %>%
 mutate(Name = ifelse(grp != 0, paste(Name, collapse = ""), Name)) %>%
 filter(!is.na(ID)) %>%
 ungroup() %>%
 select(-grp)

  Name      Val1   Val2 Unit  ID    
  <chr>    <dbl>  <dbl> <chr> <chr> 
1 <NA>    -0.500   5.50 V     UI-001
2 a       -0.500   2.50 V     UI-002
3 b       -0.500   5.50 V     UI-003
4 cde     -0.500   5.50 V     UI-004
5 <NA>   -45.0   125.   Ohms  UI-005
6 fg       2.00   NA    kV    UI-006
7 h      500.     NA    V     UI-007
8 ij      15.0    NA    kV    UI-008
9 k       NA      NA    <NA>  UI-009

首先,它为“ ID”上的NA案例以及在这些NA案例之前的“ ID”上的最后一个非NA案例创建分组变量。然后,它按该分组变量分组,并将“名称”中的值组合为一个。最后,它会过滤掉“ ID”为NA的情况,并删除多余的分组变量。

也可以使用rleid()中的data.table来更方便地创建分组变量:

df %>%
 mutate(grp = ifelse((is.na(lead(ID, default = last(ID))) & !is.na(ID)) | is.na(ID), 1, 0),
        grp = ifelse(grp == 1, rleid(grp), grp)) %>%
 group_by(grp) %>%
 mutate(Name = ifelse(grp != 0, paste(Name, collapse = ""), Name)) %>%
 filter(!is.na(ID)) %>%
 ungroup() %>%
 select(-grp)

或者使用fill()的另一种可能性:

df %>%
 mutate(ID_temp = ID) %>%
 fill(ID, .direction = "down") %>%
 group_by(ID) %>%
 mutate(Name = paste(Name, collapse = "")) %>%
 filter(!is.na(ID_temp)) %>%
 select(-ID_temp)

在这里,您要使用先前的非缺失值填充缺失的“ ID”值,并按其分组,然后按组合并行。