根据条件为另一列的列分配值

时间:2014-07-24 23:55:12

标签: r

假设我有一个这样的列表:

> desired <- c("10001", "10004")

这样的示例数据框:

> desired_sample_df <- data.frame(geo = rep("other", 30), zip = c(rep(10001:10010, 2), 10011:10020), cbsa = c(rep("NY", 20), rep("CA", 10)))
> desired_sample_df
     geo   zip cbsa
1  other 10001   NY
2  other 10002   NY
3  other 10003   NY
4  other 10004   NY
5  other 10005   NY
6  other 10006   NY
7  other 10007   NY
8  other 10008   NY
9  other 10009   NY
10 other 10010   NY
11 other 10001   NY
12 other 10002   NY
13 other 10003   NY
14 other 10004   NY
15 other 10005   NY
16 other 10006   NY
17 other 10007   NY
18 other 10008   NY
19 other 10009   NY
20 other 10010   NY
21 other 10011   CA
22 other 10012   CA
23 other 10013   CA
24 other 10014   CA
25 other 10015   CA
26 other 10016   CA
27 other 10017   CA
28 other 10018   CA
29 other 10019   CA
30 other 10020   CA

如果zip的值位于开头保存的geo列表中,我想用zip中的值覆盖desired列。


以下是我尝试的内容:

> desired_sample_df$geo[desired_sample_df$zip %in% desired] <- desired_sample_df$zip[which(desired_sample_df$zip %in% desired)]
Warning message:
In `[<-.factor`(`*tmp*`, desired_sample_df$zip %in% desired, value = c(NA,  :
  invalid factor level, NA generated


> desired_sample_df$geo[desired_sample_df$zip %in% desired] <- desired_sample_df$zip
Warning messages:
1: In `[<-.factor`(`*tmp*`, desired_sample_df$zip %in% desired, value = c(NA,  :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, desired_sample_df$zip %in% desired, value = c(NA,  :
  number of items to replace is not a multiple of replacement length

2 个答案:

答案 0 :(得分:2)

其中一个问题是数据帧中的字符串会自动成为因素。试试这个:

desired <- c("10001", "10004")
df <- data.frame(geo = rep("other", 30), zip = c(rep(10001:10010, 2), 10011:10020), cbsa = c(rep("NY", 20), rep("CA", 10)), stringsAsFactors=FALSE)

idx <- df$zip %in% desired

现在您可以通过

更改所需的元素
df[idx, ]$geo <- df[idx, ]$zip

答案 1 :(得分:1)

喜欢这个吗?

df$geo <- ifelse(df$zip %in% desired,df$zip,df$geo)

我正在呼叫您的desired_sample_df,只是df