我拼命想用数据框中的值填充矩阵。 它是交易数据,因此数据框看起来像这样:
country1 country2 value
1 Afghanistan Albania 30
2 Afghanistan Albania 81
3 Afghanistan China 5
4 Albania Germany 6
5 China Germany 8
6 China Turkey 900
7 Germany Turkey 12
8 Germany USA 3
9 Germany Zambia 700
使用unique和sort命令我创建了df中出现的所有国家的列表(并将其转换为矩阵):
countries_sorted
[1,] "Afghanistan"
[2,] "Albania"
[3,] "China"
[4,] "Germany"
[5,] "Turkey"
[6,] "USA"
[7,] "Zambia"
使用这个“列表”,我创建了一个空交易矩阵(7x7):
Afghanistan Albania China Germany Turkey USA Zambia
Afghanistan NA NA NA NA NA NA NA
Albania NA NA NA NA NA NA NA
China NA NA NA NA NA NA NA
Germany NA NA NA NA NA NA NA
Turkey NA NA NA NA NA NA NA
USA NA NA NA NA NA NA NA
Zambia NA NA NA NA NA NA NA
我现在无法用df的值列中的数字/总和来填充此矩阵。 我尝试过这样的事情:
a<-cast(df, country1~country2 , sum)
工作到一定程度但矩阵不保留其原始的7x7格式,这就是我需要一个矩阵,其中对角线都是0。
> a
country1 Albania China Germany Turkey USA Zambia
1 Afghanistan 111 5 0 0 0 0
2 Albania 0 0 6 0 0 0
3 China 0 0 8 900 0 0
4 Germany 0 0 0 12 3 700
请有解决方案的人????
答案 0 :(得分:4)
从这两个数据集开始:
#your data.frame
df <- read.table(header=T, file='clipboard', stringsAsFactors = F)
#the list of unique countries
countries <- unique(c(df$country1,df$country2))
你可以这样做:
#create all the country combinations
newdf <- expand.grid(countries, countries)
#change names
colnames(newdf) <- c('country1', 'country2')
#add a value of 0 for the new combinations (won't affect outcome)
newdf$value <- 0
#row bind with original dataset
df2 <- rbind(df, newdf)
#and create the table using xtabs:
#the aggregate function will create the sum of the value for each combination
> xtabs(value ~ country1 + country2, aggregate(value~country1+country2,df2,sum))
country2
country1 Afghanistan Albania China Germany Turkey USA Zambia
Afghanistan 0 111 5 0 0 0 0
Albania 0 0 0 6 0 0 0
China 0 0 0 8 900 0 0
Germany 0 0 0 0 12 3 700
Turkey 0 0 0 0 0 0 0
USA 0 0 0 0 0 0 0
Zambia 0 0 0 0 0 0 0
答案 1 :(得分:2)
使用dplyr
和tidyr
软件包,为@LyzandeR提供了一个很好的解决方案。
dt = read.table(text=
"country1 country2 value
Afghanistan Albania 30
Afghanistan Albania 81
Afghanistan China 5
Albania Germany 6
China Germany 8
China Turkey 900
Germany Turkey 12
Germany USA 3
Germany Zambia 700", header=T, stringsAsFactors=F)
library(dplyr)
library(tidyr)
dt2 =
dt %>%
group_by(country1,country2) %>% # for every combination of countries
summarise(SumValue = sum(value)) # get the sum of value
# get all possible countries that appear in your dataset
list_countries = union(dt2$country1, dt2$country2)
expand.grid(country1=list_countries, country2=list_countries, stringsAsFactors = F) %>% # create all possible combinations of countries
left_join(dt2, by=c("country1","country2")) %>% # join back info whenever it is found
mutate(SumValue = ifelse(is.na(SumValue),0,SumValue)) %>% # replace NAs with 0s
spread(country2,SumValue) # reshape data
# country1 Afghanistan Albania China Germany Turkey USA Zambia
# 1 Afghanistan 0 111 5 0 0 0 0
# 2 Albania 0 0 0 6 0 0 0
# 3 China 0 0 0 8 900 0 0
# 4 Germany 0 0 0 0 12 3 700
# 5 Turkey 0 0 0 0 0 0 0
# 6 USA 0 0 0 0 0 0 0
# 7 Zambia 0 0 0 0 0 0 0
答案 2 :(得分:0)
由于它只是一个上对角矩阵而对角线是0,所以除了第一列被删除之外它是相同的,因为它不包含任何信息(只有零)。您可以使用cbind:
将其添加到矩阵中Z = matrix(rep(0,7),ncol=1)
newMatrix = cbind(Z,oldMatrix)
答案 3 :(得分:0)
我知道已经很晚了,但是包 reshape2
有一个专门的功能。
以您的 data.frame
为例:
df_back_to_matrix <- reshape2::acast(df = your_df, formula = country1~country2, value.var = "value")
注意公式中变量的顺序很重要:reshape2 会将其读作 row_variable ~ column_variable