如何从包含一列的数据框中包含以“;”
分隔的值df <-data.frame(id = c(1,2,3), stock = c("Google;Yahoo","Microsoft;Google","Yahoo"))
生成如下数据框:
df <-data.frame(id = c(1,2,3), stock_1 = c("Google","Microsoft","Yahoo"), stock_2 = c("Yahoo","Google","NA"))
答案 0 :(得分:4)
1)separate_rows 使用separate_rows
将其转换为长格式,添加包含最终列名的name
列并使用spread
将其转换回来广泛的形式。
library(dplyr)
library(tidyr)
df %>%
separate_rows(stock) %>%
group_by(id) %>%
mutate(name = paste("stock", seq_along(stock), sep = "_")) %>%
ungroup %>%
spread(name, stock)
,并提供:
# A tibble: 3 x 3
id stock_1 stock_2
* <dbl> <chr> <chr>
1 1 Google Yahoo
2 2 Microsoft Google
3 3 Yahoo <NA>
2)分开如果我们知道不超过2个子字段,那么我们可以使用separate
给出相同的字段。
library(dplyr)
library(tidyr)
df %>%
separate(stock, c("stock_1", "stock_2"), fill = "right")
3)read.table 此方法不使用任何包。
stocks <- read.table(text = as.character(df$stock), sep = ";", as.is = TRUE, fill = TRUE)
names(stocks) <- paste("stock", seq_along(stocks), sep = "_")
cbind(df[1], stocks)
,并提供:
id stock_1 stock_2
1 1 Google Yahoo
2 2 Microsoft Google
3 3 Yahoo
答案 1 :(得分:2)
回答评论,加上另一个完整性选项:
library(splitstackshape)
cSplit(df, "stock", ";")
# id stock_1 stock_2
# 1: 1 Google Yahoo
# 2: 2 Microsoft Google
# 3: 3 Yahoo NA
library(data.table)
setDT(df)[, c("stock_1", "stock_2") := tstrsplit(stock, ";")][, stock := NULL][]
# id stock_1 stock_2
# 1: 1 Google Yahoo
# 2: 2 Microsoft Google
# 3: 3 Yahoo NA