Question

我有一个三列dataframe对象，该对象记录了161个国家之间的双边贸易数据，该数据的格式为二元格式，包含19687行，三列（报告者（{{1）}），合作伙伴（{{ 1}}），以及它们在给定年份的双边贸易流量（rid）。 pid或TradeValue的取值范围是1到161，并且为一个国家/地区分配了相同的rid和pid。对于任何给定的（rid，pid）对，其中rid = / = pid，rid（pid，{{1} }）= TradeValue（rid，pid）。

数据（在R中运行）如下：

TradeValue

数据来自UN Comtrade database，每个pid与多个rid配对以获取双边贸易数据，但是可以看出，并不是每个#load the data from dropbox folder library(foreign) example_data <- read.csv("https://www.dropbox.com/s/hf0ga22tdjlvdvr/example_data.csv?dl=1") head(example_data, n = 10) rid pid TradeValue 1 2 3 500 2 2 7 2328 3 2 8 2233465 4 2 9 81470 5 2 12 572893 6 2 17 488374 7 2 19 3314932 8 2 23 20323 9 2 25 10 10 2 29 9026220都有一个数字ID值，因为如果该国家/地区的相关经济指标可用，我只会为该国家/地区分配一个rid或pid，这就是为什么尽管pid在该国家和报告国家（rid）之间。当某个国家/地区成为“报告者”时，情况也是如此，该国家/地区未与合作伙伴报告任何pid，并且NA列中没有其ID号。（因此，您会看到TradeValue列以2开头，因为国家1（即阿富汗）未报告与合作伙伴的任何双边贸易数据）。快速查看摘要统计信息有助于确认这一点

rid

由于大多数国家/地区都与合作伙伴报告双边贸易数据，而对于那些没有合作伙伴的国家，则它们往往是小国。因此，我想保留161个国家的完整列表，并将此TradeValue数据框转换为161 x 161邻接矩阵，其中

对于rid列中缺少的国家（例如rid == 1），在每个国家中创建一行并将整个行（在161 x 161矩阵中）设置为0。
对于不与特定length(unique(example_data$rid)) [1] 139 # only 139 countries reported bilateral trade statistics with partners length(unique(example_data$pid)) [1] 162 # that extra pid is NA (161 + NA = 162)共享example_data条目的那些国家（rid），请将这些单元格设置为0。

例如，假设在5 x 5邻接矩阵中，国家1没有报告与合作伙伴的任何贸易统计信息，其他四个国家则与其他国家报告了双边贸易统计数据（国家1除外）。原始数据帧就像

rid

我要从中将其转换为5 x 5邻接矩阵（pid格式），所需的输出应如下所示

TradeValue

并在rid上使用相同的方法来创建161 x 161邻接矩阵。但是，在经过rid pid TradeValue 2 3 223 2 4 13 2 5 9 3 2 223 3 4 57 3 5 28 4 2 13 4 3 57 4 5 82 5 2 9 5 3 28 5 4 82和其他方法的反复试验之后，我仍然无法解决这种转换，甚至超出了第一步。

如果有人能启发我，将不胜感激？

Answer 1

我无法读取保管箱文件，但是尝试解决您的5国示例数据框-

country_num = 5

# check countries missing in rid and pid
rid_miss = setdiff(1:country_num, example_data$rid)
pid_miss = ifelse(length(setdiff(1:country_num, example_data$pid) == 0), 
                                     1, setdiff(1:country_num, example_data$pid))

# create dummy dataframe with missing rid and pid
add_data = as.data.frame(do.call(cbind, list(rid_miss, pid_miss, NA)))
colnames(add_data) = colnames(example_data)

# add dummy dataframe to original
example_data = rbind(example_data, add_data)

# the dcast now takes missing rid and pid into account
mat = dcast(example_data, rid ~ pid, value.var = "TradeValue")

# can remove first column without setting colnames but this is more failproof
rownames(mat) = mat[, 1]
mat = as.matrix(mat[, -1])

# fill in upper triangular matrix with missing values of lower triangular matrix 
# and vice-versa since TradeValue(rid, pid) = TradeValue(pid, rid)
mat[is.na(mat)] = t(mat)[is.na(mat)]

# change NAs to 0 according to preference - would keep as NA to differentiate 
# from actual zeros
mat[is.na(mat)] = 0

有帮助吗？

将数据帧（具有NA）映射到n×n邻接矩阵（作为data.frame对象）

1 个答案: