如何在for循环中向数据框添加变量?
我想创建一个数据框,其中每列是2009年到2011年之间某个地区的收入。
regions = c('A','APAC','CEE','LATAM','ME', 'NA', 'WE')
# Loop through all regions, and add them as a column in my dataframe.
for (region in regions) {
# create the query string
query_string = sprintf("SELECT date, revenue
FROM country_revenue
WHERE region = '%s'
AND date>='2009-01-01'
AND date<='2011-12-31'
ORDER BY date ASC
LIMIT 2000", region)
# Query the database, and assign the result to a variable.
assign(sprintf('rev.%s',region), mysql_query(query_string))
# I only want the 2nd column returned from my query above.
# THIS IS THE PART THAT FAILS. Error in sprintf("rev.%s", region)[, 2] : incorrect number of dimensions
sprintf('rev.%s',region) = sprintf('rev.%s',region)[,2]
# Add this variable to my data frame.
revenue = cbind(revenue, sprintf('rev.%s',region))
}
答案 0 :(得分:6)
那将是非常低效的。为什么不返回region
作为SQL调用的一部分,所以你有类似
foo <- data.frame(date = rep(Sys.Date() + 0:4, 7),
revenue = runif(7*5),
region = rep(c('A','APAC','CEE','LATAM','ME', 'NA', 'WE'),
each = 5))
> head(foo)
date revenue region
1 2012-08-04 0.1170867 A
2 2012-08-05 0.6173779 A
3 2012-08-06 0.9860934 A
4 2012-08-07 0.1344043 A
5 2012-08-08 0.5570391 A
6 2012-08-04 0.5844136 APAC
这是一个简单的dcast()
调用,可以将数据重塑为所需的格式。
> require(reshape2)
> dcast(foo, date ~ region, value.var = "revenue")
date A APAC CEE LATAM ME
1 2012-08-04 0.1170867 0.5844136 0.8011066 0.82864796 0.85856770
2 2012-08-05 0.6173779 0.7893151 0.3991653 0.41268349 0.05925445
3 2012-08-06 0.9860934 0.2812308 0.2272009 0.04599903 0.82367709
4 2012-08-07 0.1344043 0.7513777 0.8022602 0.96933913 0.61501816
5 2012-08-08 0.5570391 0.2915478 0.4601065 0.82996462 0.83779233
NA WE
1 0.4833374 0.25713295
2 0.9574843 0.22122544
3 0.5575645 0.03492411
4 0.2962364 0.51973593
5 0.9020639 0.95506837