Question

我正在尝试使用条件来根据来自两个不同数据帧的变量分配新变量。我试图使用的变量是Zipcode

数据框A包含2个变量：Zipcode，population和state 数据框B包含所有美国邮政编码和肥胖率。

我希望数据框A有一个新变量：肥胖率根据zipcode和仅适用于德克萨斯州的状态，将所有其他状态指定为0。

A

Zipcode  | Population  | State
33333   |    700   |   Texas
11111   |   600   |   Oregon
77777    |   500    |  Texas
66666   |    100   |   Texas

乙

Zipcode    obesity
11111   |     1.4
22222    |    2.2
33333    |    1.12
44444      |  3.33
55555     |   1.3
66666    |    2
77777    |    5

到目前为止，这是我的代码：

A$obesity <- ifelse((A$Zipcode == B$Zipcode) & (A$state == "Texas"), B$obesity, 0)

这通常会给德克萨斯州带来肥胖症，但并不是所有德克萨斯州的人都会感到肥胖，而且我得到了错误：

其中（A $ Zipcode == B $ Zipcode）＆amp; （$ state ==：更长的对象长度不是较短物体长度的倍数

Answer 1

听起来你真正想做的是对JOIN变量进行zipcode操作。您可以在dplyr：

中轻松完成

A %>%
    left_join(B, by = 'Zipcode')

如果您只对德州的状态感兴趣，只需添加filter条件：

A %>%
    left_join(B, by = 'Zipcode') %>%
    filter(State = 'Texas')

如果要保留所有行，但将非德克萨斯行更改为0：

# Insert 0 into non-Texan rows using bracket notation
A %>%
    left_join(B, by = 'Zipcode')

A[A$State != 'Texas', 'obesity'] <- 0

或：

# Use mutate to multiply 'obesity' by a vector of whether State == Texas
#  as.numeric() of a logical vector gives a vector of 0 or 1
#  If State == Texas: 1 * obesity = obesity; otherwise: 0 * obesity = 0
A %>%
    left_join(B, by = 'Zipcode') %>%
    mutate(obesity = obesity * as.numeric(State == 'Texas'))

使用来自2个不同数据帧的变量创建新变量

1 个答案: