如何将多列合并为一列并在R中附加其唯一代码?

时间:2018-06-22 12:02:57

标签: r stat

我有一个看起来像这样的数据框

+---------+------------+-------------+--------+
|   code  |chem_1      | chem_2      | chem_3 |
+---------+------------+-------------+--------+
|    1    |PCB001      |PCB047       |PCB047  |
|    2    |chlorpyrifos|chlorpyriphos|        | 
|    3    |TOC         |             |        |
+---------+------------+-------------+--------+

我想将所有化学物质合并为一列,并附上代码。

+-------------+--------+
| chem        | code   |
+-------------+--------+
|PCB001       | 1      |
|PCB047       | 1      | 
|PCB047       | 1      |
|chlorpyrifos | 2      |
|chlorpyriphos| 2      |
|    TOC      | 3      |
+-------------+--------+

我想知道是否有一种简单的方法可以在一个函数调用中做到这一点。非常感谢!

5 个答案:

答案 0 :(得分:0)

有很多方法可以做到这一点;这是一个使用reshape2::melt

library(reshape2);
df[df == ""] <- NA;
melt(df, id = "code", na.rm = T, value.name = "chem")[, -2]
#  code          chem
#1    1        PCB001
#2    2  chlorpyrifos
#3    3           TOC
#4    1        PCB047
#5    2 chlorpyriphos
#7    1        PCB047

我们首先将所有空值替换为NA,然后将meltna.rm = TRUE一起使用,以从宽到长整形,同时删除NA条目。


样本数据

df <- read.table(text =
    " code  chem_1       chem_2       chem_3
    1    PCB001      PCB047       PCB047
    2    chlorpyrifos   chlorpyriphos     ''
    3    TOC           ''  ''                ", header = T)

答案 1 :(得分:0)

一种tidyverse解决方案。

# Required package
library(tidyverse)

# Dummy data
df <- data.frame(code = 1:5, foo = letters[1:5], bar = LETTERS[6:10])

#    code foo bar
# 1    1   a   F
# 2    2   b   G
# 3    3   c   H
# 4    4   d   I
# 5    5   e   J

# Reformat
df %>% gather(key, chem, -code) %>% select(-key)

#    code  chem
# 1     1     a
# 2     2     b
# 3     3     c
# 4     4     d
# 5     5     e
# 6     1     F
# 7     2     G
# 8     3     H
# 9     4     I
# 10    5     J

答案 2 :(得分:0)

使用melt中的data.table

library(data.table)
library(dplyr)
melt(df, id.vars = "code", measure.vars = c("chem_1", "chem_2", "chem_3")) %>%
  arrange(code) %>%
  drop_na() %>%
  select(-variable)

 # code         value
 #1    1        PCB001
 #2    1        PCB047
 #3    1        PCB047
 #4    2  chlorpyrifos
 #5    2 chlorpyriphos
 #7    3           TOC

数据: 使用' '

NA空格替换为na.strings
df <- read.table(text =
   " code  chem_1       chem_2       chem_3
    1    PCB001      PCB047       PCB047
    2    chlorpyrifos   chlorpyriphos     ' '
    3    TOC           ' '  ' '                ", na.strings=" ", header = T)

答案 3 :(得分:0)

考虑基于R的reshape

data <- data.frame(code = c(1:3),
                   chem_1 = c("PCB001", "chlorpyrifo", "TOC"),
                   chem_2 = c("PCB047", "chlorpyriphos", NA),
                   chem_3 = c("PCB047", NA, NA))

rdf <- reshape(data, varying = names(data)[-1], v.names = "chem", 
               times = names(data)[-1], timevar = "type", idvar = "code",
               new.row.names = 1:1000, direction = "long")    
rdf

#   code   type          chem
# 1    1 chem_1        PCB001
# 2    2 chem_1   chlorpyrifo
# 3    3 chem_1           TOC
# 4    1 chem_2        PCB047
# 5    2 chem_2 chlorpyriphos
# 6    3 chem_2          <NA>
# 7    1 chem_3        PCB047
# 8    2 chem_3          <NA>
# 9    3 chem_3          <NA>

答案 4 :(得分:0)

您可以利用R的回收功能在基础data.frame中进行此操作:

df1 <- subset(data.frame(df[1],chem = unlist(df[-1])),chem!="")
df1[order(df1$code),] # if you need it sorted
#         code          chem
# chem_11    1        PCB001
# chem_21    1        PCB047
# chem_31    1        PCB047
# chem_12    2  chlorpyrifos
# chem_22    2 chlorpyriphos
# chem_13    3           TOC