数据帧算术

时间:2012-09-10 19:03:47

标签: r

  

可能重复:
  create new data frame from a function of other data frames

我对SOF的第一个问题得到了一些帮助,我不知道如何回应被访者。所以,我再次发布示例代码(应该是第一次这样做 - 我正在学习)。

我有两个数据框。我们假装是为了解释:

DF1 列代表收益类型:玉米,燕麦,小麦等。 行代表一年中的月份,jan,feb等 元素表示在该特定月份购买的该类谷物的每吨价格。

DF2 代表国家的列:西班牙,智利,墨西哥等 此框架的行代表额外成本,可能:      每个国家的包装成本,运输成本,国家进口税,检验费等。

现在我想构建第三个数据框:

DF3 它表示每个国家的谷物组合(例如10%玉米,50%燕麦,......)与相关的运输,税收等成本的总成本。假设存在一个等式(使用来自df1和df2的数据)来计算给定谷物组合的每个国家/地区每月的总成本以及每个国家的额外成本。

另一个词,df3有12行(月)和列数与国家一样多。 它的要素是每个国家每个月的粮食总成本+成本。

在Excel / Gnumeric中花两分钟,在Fortran或C中花费15分钟,两天在R Cookbook和互联网搜索中挣扎。 而且,我没有人在大厅里喊叫,“嘿,凯文,你怎么在R ......这样做?”

如此简单,但对于新手来说,我忽略了一些基本点......

提前致谢,这是我的假装代码,说明了我的问题。

# build df1 - cost of grains (with goofy data so I can track the arithemetic)
  v1 <- c(1:12)
  v2 <- c(13:24)
  v3 <- c(25:36)
  v4 <- c(37:48)
  grain <- data.frame("wheat"=v1,"oats"=v2,"corn"=v3,"rye"=v4)

  grain


# build df2 - additional costs (again, with goofy data to see what is being used where and when)
  w1 <- c(1.3:4.3)
  w2 <- c(5.3:8.3)
  w3 <- c(9.3:12.3)
  w4 <- c(13.3:16.3)
  cost <- data.frame("Spain"=w1,"Peru"=w2,"Mexico"=w3,"Kenya"=w4)
  row.names(cost) <- c("packing","shipping","tax","inspection")

  cost


# assume 10% wheat, 30% oats and 60% rye with some clown-equation for total cost

# now for my feeble attemp at getting a dataframe that has 12 rows (months) and 4 column (countries)

  total_cost <- data.frame( 0.1*grain[,"wheat"] +
                            0.3*grain[,"oats"] +
                            0.6*grain[,"rye"] +
                            cost["packing","Mexico"] +
                            cost["shipping","Mexico"] +
                            cost["tax","Mexico"]  +
                            cost["inspection","Mexico"] )
  total_cost

# this gives the correct values for the total cost for Mexico, for each month.

# and if I plug in the other countries, I get correct answers for that country
# I guess I can run a loop over the counties, but this is R, not Fortran or C. 

# btw, my real equation is considerably more complicated, using functions involving
# multiple columns of df1 and df2 data, so there is no "every column of a df1 get 
#multipied by... or any one-to-one column-row matches.

0 个答案:

没有答案