R:如何为多个值创建虚拟变量?

时间:2019-05-24 19:27:40

标签: r

我有一个包含多个国家/地区的数据集,我想为各大洲创建一个虚拟变量。

此刻我的数据集如下:

+---------------+-----------+-----+-----+-----+
|    Country    |  Period   |  X  |  Y  |  Z  |
+---------------+-----------+-----+-----+-----+
| Argentina     | 1991-1995 | ... | ... | ... |
| Argentina     | 1996-2000 | ... | ... | ... |
| Bolivia       | 1991-1995 | ... | ... | ... |
| Bolivia       | 1996-2000 | ... | ... | ... |
| Brazil        | 1991-1995 | ... | ... | ... |
| Brazil        | 1996-2000 | ... | ... | ... |
| Canada        | 1991-1995 | ... | ... | ... |
| Canada        | 1996-2000 | ... | ... | ... |
| United States | 1991-1995 | ... | ... | ... |
| United States | 1996-2000 | ... | ... | ... |
+---------------+-----------+-----+-----+-----+

我想要的输出如下:

+---------------+-----------+-----+-----+-----+---------+---------+
|    Country    |  Period   |  X  |  Y  |  Z  | dummySA | dummyNA |
+---------------+-----------+-----+-----+-----+---------+---------+
| Argentina     | 1991-1995 | ... | ... | ... |       1 |       0 |
| Argentina     | 1996-2000 | ... | ... | ... |       1 |       0 |
| Bolivia       | 1991-1995 | ... | ... | ... |       1 |       0 |
| Bolivia       | 1996-2000 | ... | ... | ... |       1 |       0 |
| Brazil        | 1991-1995 | ... | ... | ... |       1 |       0 |
| Brazil        | 1996-2000 | ... | ... | ... |       1 |       0 |
| Canada        | 1991-1995 | ... | ... | ... |       0 |       1 |
| Canada        | 1996-2000 | ... | ... | ... |       0 |       1 |
| United States | 1991-1995 | ... | ... | ... |       0 |       1 |
| United States | 1996-2000 | ... | ... | ... |       0 |       1 |
+---------------+-----------+-----+-----+-----+---------+---------+

因此,我想为南美所有国家提供一个虚拟产品,为北美所有国家提供一个虚拟产品。我知道如何为单个国家或地区创建虚拟对象,但不能为多个值创建虚拟对象。

1 个答案:

答案 0 :(得分:2)

如果只有少数几个国家,请使用%in%

创建虚拟列
library(dplyr)
df1 %>%
    mutate(dummySA = as.integer(Country %in% 
        c("Argentina", "Bolivia", "Brazil")), 
        dummyNA = as.integer(!dummySA))

否则,请使用“国家/地区”和地理区域创建键/值数据集,并进行合并/联接,并通过spread创建虚拟值

library(tidyr)
df1 %>% 
   left_join(keyvaldat) %>%
   mutate(n = 1) %>%
   spread(value, n, fill = 0)