分析R中的贸易数据:确定市场份额和贸易趋势

时间:2014-12-21 04:50:38

标签: r statistics aggregate subset

简介

我在一个非常大的数据集中有一些基本的交易数据。我想用R来比较一个国家(比如说美国)的单一商品(比如说,'勺子)的出口国的市场份额。数据框中的样本数据集称为' trade'看起来像

shipment <- c(1, 2, 3, 4, 5, 6, 7, 8) #transaction ID number for a shipment of spoons
date <- as.Date(c("2006-08-06", "2006-07-30", "2006-04-16", "2006-02-05", "2007-01-10", "2007-09-22", "2007-10-15", "2007-03-30")) #date of shipment
value <- as.integer(c(1208, 23820, 402, 89943, 643, 45322, 25435, 1455)) #value of the shipment, in USD
country <- c("France", "Spain", "France", "Belgium", "France", "Belgium", "Spain", "Belgium") #the country where the export originated from
trade <- data.frame(shipment, date, value, country)

我希望汇总交易级数据并在国家层面对其进行分析,以确定勺子行业随着时间的推移如何演变 - 也就是说,哪个国家/地区在任何特定年份都是参与者。

这是我设计的代码(在Matthew Lundberg的帮助下),但它似乎很长很笨拙,所以我想知道是否有更简单的方法。

R代码

agyr <- aggregate(value ~ format(date, "%Y") + country, data=trade, FUN=sum) #to get value of exports by country and year
colnames(agyr)[1]="year" #rename the 'year' variable

#reshapes from long to wide
agyrw <- reshape(agyr, 
  timevar = "year",
  idvar = c("country"),
  direction = "wide")

#sums total trade value, by year
sum2006 <- sum(agyrw$value.2006) 
sum2007 <- sum(agyrw$value.2007)

#creates new variables of market share, by year
agyrw$share.2006 <- agyrw$value.2006 / sum2006
agyrw$share.2007 <- agyrw$value.2007 / sum2007

#formats the market share variable to only 4 decimals places
agyrw$share.2006 <- format(round(agyrw$share.2006, 4), nsmall = 4)
agyrw$share.2007 <- format(round(agyrw$share.2007, 4), nsmall = 4)

#reconverts the market share variable back into numeric so that it can be ordered
agyrw$share.2006 <- as.numeric(agyrw$share.2006)
agyrw$share.2007 <- as.numeric(agyrw$share.2007)

# sorts the data frame by 2007 and 2006 market share
agyrw <- agyrw[order(-agyrw$share.2007, agyrw$share.2006), ]

# displays the data frame 
agyrw
  country value.2006 value.2007 share.2006 share.2007
1 Belgium      89943      46777     0.7796     0.6421
5   Spain      23820      25435     0.2065     0.3491
3  France       1610        643     0.0140     0.0088

1 个答案:

答案 0 :(得分:1)

要按国家/地区和年份获取出口,aggregate非常方便:

aggregate(value ~ format(date, "%Y") + country, data=trade, FUN=sum)
##   format(date, "%Y") country value
## 1               2013 Belgium 89943
## 2               2006  France  1208
## 3               2009  France   402
## 4               2008   Spain 23820

然后你可以拿这个并产生每年的份额。它将有助于重命名上面的第一列:

ag <- aggregate(value ~ format(date, "%Y") + country, data=trade, FUN=sum) 
names(ag)[1] <- year

ag$share <- ave(ag$value, ag$country, ag$year, FUN=function(x) x/sum(x))
ag
##   year country value share
## 1 2013 Belgium 89943     1
## 2 2006  France  1208     1
## 3 2009  France   402     1
## 4 2008   Spain 23820     1

请注意,这些年份在您的示例中是唯一的,因此每个国家/地区获得100%。