我有一个看起来像这样的数据框
+---------+------------+-------------+--------+
| code |chem_1 | chem_2 | chem_3 |
+---------+------------+-------------+--------+
| 1 |PCB001 |PCB047 |PCB047 |
| 2 |chlorpyrifos|chlorpyriphos| |
| 3 |TOC | | |
+---------+------------+-------------+--------+
我想将所有化学物质合并为一列,并附上代码。
+-------------+--------+
| chem | code |
+-------------+--------+
|PCB001 | 1 |
|PCB047 | 1 |
|PCB047 | 1 |
|chlorpyrifos | 2 |
|chlorpyriphos| 2 |
| TOC | 3 |
+-------------+--------+
我想知道是否有一种简单的方法可以在一个函数调用中做到这一点。非常感谢!
答案 0 :(得分:0)
有很多方法可以做到这一点;这是一个使用reshape2::melt
library(reshape2);
df[df == ""] <- NA;
melt(df, id = "code", na.rm = T, value.name = "chem")[, -2]
# code chem
#1 1 PCB001
#2 2 chlorpyrifos
#3 3 TOC
#4 1 PCB047
#5 2 chlorpyriphos
#7 1 PCB047
我们首先将所有空值替换为NA
,然后将melt
与na.rm = TRUE
一起使用,以从宽到长整形,同时删除NA
条目。
df <- read.table(text =
" code chem_1 chem_2 chem_3
1 PCB001 PCB047 PCB047
2 chlorpyrifos chlorpyriphos ''
3 TOC '' '' ", header = T)
答案 1 :(得分:0)
一种tidyverse
解决方案。
# Required package
library(tidyverse)
# Dummy data
df <- data.frame(code = 1:5, foo = letters[1:5], bar = LETTERS[6:10])
# code foo bar
# 1 1 a F
# 2 2 b G
# 3 3 c H
# 4 4 d I
# 5 5 e J
# Reformat
df %>% gather(key, chem, -code) %>% select(-key)
# code chem
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
# 5 5 e
# 6 1 F
# 7 2 G
# 8 3 H
# 9 4 I
# 10 5 J
答案 2 :(得分:0)
使用melt
中的data.table
library(data.table)
library(dplyr)
melt(df, id.vars = "code", measure.vars = c("chem_1", "chem_2", "chem_3")) %>%
arrange(code) %>%
drop_na() %>%
select(-variable)
# code value
#1 1 PCB001
#2 1 PCB047
#3 1 PCB047
#4 2 chlorpyrifos
#5 2 chlorpyriphos
#7 3 TOC
数据:
使用' '
NA
空格替换为na.strings
df <- read.table(text =
" code chem_1 chem_2 chem_3
1 PCB001 PCB047 PCB047
2 chlorpyrifos chlorpyriphos ' '
3 TOC ' ' ' ' ", na.strings=" ", header = T)
答案 3 :(得分:0)
考虑基于R的reshape
:
data <- data.frame(code = c(1:3),
chem_1 = c("PCB001", "chlorpyrifo", "TOC"),
chem_2 = c("PCB047", "chlorpyriphos", NA),
chem_3 = c("PCB047", NA, NA))
rdf <- reshape(data, varying = names(data)[-1], v.names = "chem",
times = names(data)[-1], timevar = "type", idvar = "code",
new.row.names = 1:1000, direction = "long")
rdf
# code type chem
# 1 1 chem_1 PCB001
# 2 2 chem_1 chlorpyrifo
# 3 3 chem_1 TOC
# 4 1 chem_2 PCB047
# 5 2 chem_2 chlorpyriphos
# 6 3 chem_2 <NA>
# 7 1 chem_3 PCB047
# 8 2 chem_3 <NA>
# 9 3 chem_3 <NA>
答案 4 :(得分:0)
您可以利用R
的回收功能在基础data.frame
中进行此操作:
df1 <- subset(data.frame(df[1],chem = unlist(df[-1])),chem!="")
df1[order(df1$code),] # if you need it sorted
# code chem
# chem_11 1 PCB001
# chem_21 1 PCB047
# chem_31 1 PCB047
# chem_12 2 chlorpyrifos
# chem_22 2 chlorpyriphos
# chem_13 3 TOC