我有一个数据框,其中包含市政名称和州名称。它看起来像这样:
my.df <- structure(list(Location = c("Abatiá", "Adrianópolis", "Agudos do Sul",
"Almirante Tamandaré", "Altamira do Paraná", "Altônia"), State = c("PR",
"PR", "PR", "PR", "PR", "PR")), .Names = c("Location", "State"
), row.names = 0:5, class = "data.frame")
我需要做的是将此数据帧转换为数组。预期的输出将是:
my.array$PR
Abatiá, PR
Adrianópolis, PR
Agudos do Sul, PR
...
my.array$RS
Vitória das Missões, RS
Westfalia, RS
Xangri-lá, RS
...
等等。
我怎样才能到达那里?
我的实际数据集大约有10k行,因此快速解决方案可能比清晰度更好。谢谢!
答案 0 :(得分:4)
以下内容可为您提供所需内容。
df = data.frame("location" = c("a", "b", "c", "d", "e", "f"), "state" = c("pr", "pr", "pr", "rs", "rs", "rs"), stringsAsFactors=F)
my.array = lapply(unique(df$state), function(x) paste(df$location[df$state == x], df$state[df$state == x], sep=", "))
names(my.array) = unique(df$state)
my.array$pr
# [1] "a, pr" "b, pr", "c, pr"
我简化了df
中的值,但这一点保持不变。
答案 1 :(得分:4)
因为您需要类似列表的结果(即您可以使用$
进行索引),请在split
上使用State
。它自然会生成一个名为State
的列表
一种方法是先拆分
split_df <- split(my.df, my.df$State)
my.array <- sapply(names(split_df), function(name)
paste(split_df[[name]][["Location"]],
", ", name, sep=""),
USE.NAMES = TRUE)
第二种使用split的方法(在考虑了你的问题之后,看起来更优雅)就是在位置之后拆分,直接显示状态对
# First, create a new vector (array) of location, state pairs
# use apply(X, 1, FUN) which works row-wise along X
# and for each row, paste it together
location_state <- apply(my.df,
1,
function(r) paste(r["Location"],
r["State"],
sep=', '))
#Second, split that vector, using State
split(location_state, my.df$State)
示例数据
states <- sapply(1:100, function(pass) paste0(sample(LETTERS, 2), collapse=""))
my.df <- data.frame(State=sample(states, 10000, replace=TRUE),
Location=sapply(1:1e4, function(pass) paste0(sample(letters, 5),
collapse="")),
stringsAsFactors=FALSE)
答案 2 :(得分:2)
如何使用Reduce?
Reduce(function(...) paste(..., sep=", "), my.df)
编辑:使用@thelatemail建议更新基准测试
#for your benchmarking using 1 million rows
library(rbenchmark)
df <- data.frame(X=rnorm(1e6), Y=rnorm(1e6))
benchmark(M1=Reduce(function(...) paste(..., sep=", "), df),
M2=do.call(paste, c(df, sep=", ")))
##test replications elapsed relative user.self sys.self user.child sys.child
##1 M1 10 68.60 1.219 68.55 0.00 NA NA
##2 M2 10 56.28 1.000 56.22 0.07 NA NA
do.call(paste,c(df,sep =&#34;,&#34;))肯定更快!