如何使用数据框值填充空矩阵

时间:2015-09-23 09:15:46

标签: r matrix casting

我拼命想用数据框中的值填充矩阵。 它是交易数据,因此数据框看起来像这样:

country1 country2 value
1 Afghanistan  Albania    30
2 Afghanistan  Albania    81
3 Afghanistan    China     5
4     Albania  Germany     6
5       China  Germany     8
6       China   Turkey   900
7     Germany   Turkey    12
8     Germany      USA     3
9     Germany   Zambia   700

使用unique和sort命令我创建了df中出现的所有国家的列表(并将其转换为矩阵):

     countries_sorted
[1,] "Afghanistan"   
[2,] "Albania"       
[3,] "China"         
[4,] "Germany"       
[5,] "Turkey"        
[6,] "USA"           
[7,] "Zambia"    

使用这个“列表”,我创建了一个空交易矩阵(7x7):

             Afghanistan Albania China Germany Turkey USA Zambia
Afghanistan          NA      NA    NA      NA     NA  NA     NA
Albania              NA      NA    NA      NA     NA  NA     NA
China                NA      NA    NA      NA     NA  NA     NA
Germany              NA      NA    NA      NA     NA  NA     NA
Turkey               NA      NA    NA      NA     NA  NA     NA
USA                  NA      NA    NA      NA     NA  NA     NA
Zambia               NA      NA    NA      NA     NA  NA     NA

我现在无法用df的值列中的数字/总和来填充此矩阵。 我尝试过这样的事情:

a<-cast(df, country1~country2 , sum)

工作到一定程度但矩阵不保留其原始的7x7格式,这就是我需要一个矩阵,其中对角线都是0。

> a
     country1 Albania China Germany Turkey USA Zambia
1 Afghanistan     111     5       0      0   0      0
2     Albania       0     0       6      0   0      0
3       China       0     0       8    900   0      0
4     Germany       0     0       0     12   3    700

请有解决方案的人????

4 个答案:

答案 0 :(得分:4)

从这两个数据集开始:

#your data.frame
df <- read.table(header=T, file='clipboard', stringsAsFactors = F)
#the list of unique countries
countries <- unique(c(df$country1,df$country2))

你可以这样做:

#create all the country combinations
newdf <- expand.grid(countries, countries)
#change names
colnames(newdf) <- c('country1', 'country2')
#add a value of 0 for the new combinations (won't affect outcome)
newdf$value <- 0
#row bind with original dataset
df2 <- rbind(df, newdf)


#and create the table using xtabs:
#the aggregate function will create the sum of the value for each combination
> xtabs(value ~ country1 + country2, aggregate(value~country1+country2,df2,sum))
             country2
country1      Afghanistan Albania China Germany Turkey USA Zambia
  Afghanistan           0     111     5       0      0   0      0
  Albania               0       0     0       6      0   0      0
  China                 0       0     0       8    900   0      0
  Germany               0       0     0       0     12   3    700
  Turkey                0       0     0       0      0   0      0
  USA                   0       0     0       0      0   0      0
  Zambia                0       0     0       0      0   0      0

答案 1 :(得分:2)

使用dplyrtidyr软件包,为@LyzandeR提供了一个很好的解决方案。

dt = read.table(text=
"country1 country2 value
Afghanistan  Albania    30
Afghanistan  Albania    81
Afghanistan    China     5
Albania  Germany     6
China  Germany     8
China   Turkey   900
Germany   Turkey    12
Germany      USA     3
Germany   Zambia   700", header=T, stringsAsFactors=F)

library(dplyr)
library(tidyr)

dt2 = 
    dt %>% 
      group_by(country1,country2) %>%    # for every combination of countries
      summarise(SumValue = sum(value))   # get the sum of value

# get all possible countries that appear in your dataset
list_countries = union(dt2$country1, dt2$country2)

expand.grid(country1=list_countries, country2=list_countries, stringsAsFactors = F) %>%  # create all possible combinations of countries
  left_join(dt2, by=c("country1","country2")) %>%  # join back info whenever it is found
  mutate(SumValue = ifelse(is.na(SumValue),0,SumValue)) %>%  # replace NAs with 0s
  spread(country2,SumValue)  # reshape data

#     country1 Afghanistan Albania China Germany Turkey USA Zambia
# 1 Afghanistan           0     111     5       0      0   0      0
# 2     Albania           0       0     0       6      0   0      0
# 3       China           0       0     0       8    900   0      0
# 4     Germany           0       0     0       0     12   3    700
# 5      Turkey           0       0     0       0      0   0      0
# 6         USA           0       0     0       0      0   0      0
# 7      Zambia           0       0     0       0      0   0      0

答案 2 :(得分:0)

由于它只是一个上对角矩阵而对角线是0,所以除了第一列被删除之外它是相同的,因为它不包含任何信息(只有零)。您可以使用cbind:

将其添加到矩阵中
Z = matrix(rep(0,7),ncol=1)
newMatrix = cbind(Z,oldMatrix)

答案 3 :(得分:0)

我知道已经很晚了,但是包 reshape2 有一个专门的功能。 以您的 data.frame 为例:

df_back_to_matrix <- reshape2::acast(df = your_df, formula = country1~country2, value.var = "value")

注意公式中变量的顺序很重要:reshape2 会将其读作 row_variable ~ column_variable