我在一个非常大的数据集中有一些基本的交易数据。我想用R来比较一个国家(比如说美国)的单一商品(比如说,'勺子)的出口国的市场份额。数据框中的样本数据集称为' trade'看起来像
shipment <- c(1, 2, 3, 4, 5, 6, 7, 8) #transaction ID number for a shipment of spoons
date <- as.Date(c("2006-08-06", "2006-07-30", "2006-04-16", "2006-02-05", "2007-01-10", "2007-09-22", "2007-10-15", "2007-03-30")) #date of shipment
value <- as.integer(c(1208, 23820, 402, 89943, 643, 45322, 25435, 1455)) #value of the shipment, in USD
country <- c("France", "Spain", "France", "Belgium", "France", "Belgium", "Spain", "Belgium") #the country where the export originated from
trade <- data.frame(shipment, date, value, country)
我希望汇总交易级数据并在国家层面对其进行分析,以确定勺子行业随着时间的推移如何演变 - 也就是说,哪个国家/地区在任何特定年份都是参与者。
这是我设计的代码(在Matthew Lundberg的帮助下),但它似乎很长很笨拙,所以我想知道是否有更简单的方法。
agyr <- aggregate(value ~ format(date, "%Y") + country, data=trade, FUN=sum) #to get value of exports by country and year
colnames(agyr)[1]="year" #rename the 'year' variable
#reshapes from long to wide
agyrw <- reshape(agyr,
timevar = "year",
idvar = c("country"),
direction = "wide")
#sums total trade value, by year
sum2006 <- sum(agyrw$value.2006)
sum2007 <- sum(agyrw$value.2007)
#creates new variables of market share, by year
agyrw$share.2006 <- agyrw$value.2006 / sum2006
agyrw$share.2007 <- agyrw$value.2007 / sum2007
#formats the market share variable to only 4 decimals places
agyrw$share.2006 <- format(round(agyrw$share.2006, 4), nsmall = 4)
agyrw$share.2007 <- format(round(agyrw$share.2007, 4), nsmall = 4)
#reconverts the market share variable back into numeric so that it can be ordered
agyrw$share.2006 <- as.numeric(agyrw$share.2006)
agyrw$share.2007 <- as.numeric(agyrw$share.2007)
# sorts the data frame by 2007 and 2006 market share
agyrw <- agyrw[order(-agyrw$share.2007, agyrw$share.2006), ]
# displays the data frame
agyrw
country value.2006 value.2007 share.2006 share.2007
1 Belgium 89943 46777 0.7796 0.6421
5 Spain 23820 25435 0.2065 0.3491
3 France 1610 643 0.0140 0.0088
答案 0 :(得分:1)
要按国家/地区和年份获取出口,aggregate
非常方便:
aggregate(value ~ format(date, "%Y") + country, data=trade, FUN=sum)
## format(date, "%Y") country value
## 1 2013 Belgium 89943
## 2 2006 France 1208
## 3 2009 France 402
## 4 2008 Spain 23820
然后你可以拿这个并产生每年的份额。它将有助于重命名上面的第一列:
ag <- aggregate(value ~ format(date, "%Y") + country, data=trade, FUN=sum)
names(ag)[1] <- year
ag$share <- ave(ag$value, ag$country, ag$year, FUN=function(x) x/sum(x))
ag
## year country value share
## 1 2013 Belgium 89943 1
## 2 2006 France 1208 1
## 3 2009 France 402 1
## 4 2008 Spain 23820 1
请注意,这些年份在您的示例中是唯一的,因此每个国家/地区获得100%。