我正在尝试根据数据中的详细类别创建一个广泛的行业类别。我想知道用R中的grepl创建它会出错吗?
我的示例数据如下:
df <- data.frame(county = c(01001, 01002, 02003, 04004, 08005, 01002, 02003, 04004),
ind = c("0700","0701","0780","0980","1000","1429","0840","1500"))
我正在尝试借助R中的grepl或str_replace命令创建一个具有两个级别(例如,农业,制造业)的行业变量。
我已经尝试过了:
newdf$industry <- ""
newdf[df$ind %>% grepl(c("^07|^08|^09", levels(df$ind), value = TRUE)), "industry"] <- "Agri"
但这给了我以下错误:
argument 'pattern' has length > 1 and only the first element will be used
我想得到以下数据框作为结果:
newdf <- data.frame(county = c(01001, 01002, 02003, 04004, 08005, 01002, 02003, 04004),
ind = c("0700","0701","0780","0980","1000","1429","0840","1500"),
industry = c("Agri", "Agri", "Agri", "Agri", "Manufacturing", "Manufacturing", "Agri", "Manufacturing"))
所以我的问题是,如何指定变量'ind'以07,08或09开头,如果'ind'以10、14或15开头,我的行业变量将取值为'agri'。会“制造”吗?不用说,我试图在10个类别中处理大量的行业代码,因此寻找一种可以帮助我实现模式识别的解决方案。
感谢您的帮助!谢谢!
答案 0 :(得分:1)
尝试一下:
newdf = df %>%
mutate(industry = ifelse(str_detect(string = ind,
pattern = '^07|^08|^09'),
'Agri',
'Manufacturing'))
答案 1 :(得分:1)
使用ifelse()
将所需的列添加到df
data.frame中,此方法有效
df$industry <- ifelse(grepl(paste0("^", c('07','08','09'), collapse = "|"), df$ind), "Agri", "Manufacturing")
> df
county ind industry
1 1001 0700 Agri
2 1002 0701 Agri
3 2003 0780 Agri
4 4004 0980 Agri
5 8005 1000 Manufacturing
6 1002 1429 Manufacturing
7 2003 0840 Agri
8 4004 1500 Manufacturing