合并R中数据框中的两行

时间:2014-04-07 04:19:03

标签: r dataframe

我正在尝试根据data.frame值合并<NA>中的行。

这是我的数据框。

new <- data.frame (
  Location = c(rep("Loc 1", 4), rep("Loc 2", 4)), 
  Place = c("Powder Springs_Original", "Bridge_Other County", "Airport", "County1", "City 4 - Duplicated", "South", "County2", "Formal place"), 
  Val1 = c(109, 123, NA, 117, 143, NA, 151, 142), 
  Val2 = c(102, 115, NA, 45, 135, NA, 144, 125), 
  Val3 = c(99, 112, NA, 26,  127, NA, 140, 132), 
  Val4 = c(90, 103, NA, 57, 125, NA, 135, 201))

我期待类似的事情,

Location Place                      Val1 Val2 Val3 Val4
Loc 1    Powder Springs - Original  109   102   99  90
Loc 1    Bridge _ Other County      123   115  112  103
Loc 1    Airport County1            117   45    26  57
Loc 2    City 4 - Duplicated        143   135  127  125
Loc 2    South County2              151   144  140  135
Loc 2    Formal place               142   125  132  201

我想删除NA行并将数据与下一行合并。这些值的位置相同。有人可以帮助我。

提前致谢。

3 个答案:

答案 0 :(得分:1)

首先,您不应该使用new作为变量名称,因为它是内置的R函数。其次,你可以这样做:

# Find which rows are NA
na_rows <- which(apply(new, 1, function(x) all("NA" == (x[paste0('Val', 1:4)]))))
# Set correct place names
new$Place <- as.character(new$Place)
new$Place[na_rows + 1] <- paste(new$Place[na_rows], new$Place[na_rows + 1])
# Remove NAs
new <- new[-na_rows, ]
#   Location                   Place Val1 Val2 Val3 Val4
# 1    Loc 1 Powder Springs_Original  109  102   99   90
# 2    Loc 1     Bridge_Other County  123  115  112  103
# 4    Loc 1         Airport County1  117   45   26   57
# 5    Loc 2     City 4 - Duplicated  143  135  127  125
# 7    Loc 2           South County2  151  144  140  135
# 8    Loc 2            Formal place  142  125  132  201

答案 1 :(得分:0)

(编辑为初始答案不完整)

nu <- data.frame (
  Location = c(rep("Loc 1", 4), rep("Loc 2", 4)), 
  Place = c("Powder Springs_Original", "Bridge_Other County", "Airport", "County1", "City 4 - Duplicated", "South", "County2", "Formal place"), 
  Val1 = c(109, 123, NA, 117, 143, NA, 151, 142), 
  Val2 = c(102, 115, NA, 45, 135, NA, 144, 125), 
  Val3 = c(99, 112, NA, 26,  127, NA, 140, 132), 
  Val4 = c(90, 103, NA, 57, 125, NA, 135, 201), stringsAsFactors=FALSE)
# notice stringsAsFactors = FALSE
# if there was justice in the world, it should be FALSE by default in R
# in any case, nu$Place should be character rather than factor so in real data 
# you may need to do nu$Place <- as.character(nu$Place)

ic <- which(!complete.cases(nu))
nu$Place[ic-1] <- paste(nu$Place[ic-1], nu$Place[ic])
nu <- nu[-ic,]

这可以满足您的需求吗?

答案 2 :(得分:0)

感谢您的帮助和支持。经过很多小道后,我得到了以下所需的输出。 (根据@Robert Krzyzanowski的建议,我将data.frame重命名为Test

这就是我所做的。如果发现任何奇怪的事,请建议。

> new_DF <- subset(Test, is.na(Test$Val1))
> new_DF
  Location   Place Val1 Val2 Val3 Val4
3    Loc 1 Airport   NA   NA   NA   NA
6    Loc 2   South   NA   NA   NA   NA
> 
> row.names(new_DF)
[1] "3" "6"
> x.num <- as.numeric(row.names(new_DF))
> 
> Test$Place <- as.character(Test$Place)
> Test$Place[x.num + 1] <- paste(Test$Place[x.num], Test$Place[x.num + 1])
> Test <- Test[-x.num, ]
> Test
  Location                   Place Val1 Val2 Val3 Val4
1    Loc 1 Powder Springs_Original  109  102   99   90
2    Loc 1     Bridge_Other County  123  115  112  103
4    Loc 1         Airport County1  117   45   26   57
5    Loc 2     City 4 - Duplicated  143  135  127  125
7    Loc 2           South County2  151  144  140  135
8    Loc 2            Formal place  142  125  132  201

再一次感谢大家的支持和时间。