假设我有一个这样的列表:
> desired <- c("10001", "10004")
这样的示例数据框:
> desired_sample_df <- data.frame(geo = rep("other", 30), zip = c(rep(10001:10010, 2), 10011:10020), cbsa = c(rep("NY", 20), rep("CA", 10)))
> desired_sample_df
geo zip cbsa
1 other 10001 NY
2 other 10002 NY
3 other 10003 NY
4 other 10004 NY
5 other 10005 NY
6 other 10006 NY
7 other 10007 NY
8 other 10008 NY
9 other 10009 NY
10 other 10010 NY
11 other 10001 NY
12 other 10002 NY
13 other 10003 NY
14 other 10004 NY
15 other 10005 NY
16 other 10006 NY
17 other 10007 NY
18 other 10008 NY
19 other 10009 NY
20 other 10010 NY
21 other 10011 CA
22 other 10012 CA
23 other 10013 CA
24 other 10014 CA
25 other 10015 CA
26 other 10016 CA
27 other 10017 CA
28 other 10018 CA
29 other 10019 CA
30 other 10020 CA
如果zip的值位于开头保存的geo
列表中,我想用zip中的值覆盖desired
列。
以下是我尝试的内容:
> desired_sample_df$geo[desired_sample_df$zip %in% desired] <- desired_sample_df$zip[which(desired_sample_df$zip %in% desired)]
Warning message:
In `[<-.factor`(`*tmp*`, desired_sample_df$zip %in% desired, value = c(NA, :
invalid factor level, NA generated
> desired_sample_df$geo[desired_sample_df$zip %in% desired] <- desired_sample_df$zip
Warning messages:
1: In `[<-.factor`(`*tmp*`, desired_sample_df$zip %in% desired, value = c(NA, :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, desired_sample_df$zip %in% desired, value = c(NA, :
number of items to replace is not a multiple of replacement length
答案 0 :(得分:2)
其中一个问题是数据帧中的字符串会自动成为因素。试试这个:
desired <- c("10001", "10004")
df <- data.frame(geo = rep("other", 30), zip = c(rep(10001:10010, 2), 10011:10020), cbsa = c(rep("NY", 20), rep("CA", 10)), stringsAsFactors=FALSE)
idx <- df$zip %in% desired
现在您可以通过
更改所需的元素df[idx, ]$geo <- df[idx, ]$zip
答案 1 :(得分:1)
喜欢这个吗?
df$geo <- ifelse(df$zip %in% desired,df$zip,df$geo)
我正在呼叫您的desired_sample_df
,只是df
。