我有一个具有以下结构的数据框,标题为“ final_proj_data”
ID County Population Year
<dbl> <chr> <dbl> <dbl>
1003 Baldwin County, Alabama 169162 2006
1015 Calhoun County, Alabama 112903 2006
1043 Cullman County, Alabama 80187 2006
1049 DeKalb County, Alabama 68014 2006
我试图将“县”列拆分为“县”和“州”两个不同的列,并删除逗号。
我尝试了split()函数的许多排列,但我一直找回此错误:
错误:
var
必须计算为单个数字或列名,而不是 字符向量
我已经尝试过了
final_proj_data %>%
separate(final_proj_data$County, c("State", "County"), sep = ",", remove = TRUE)
final_proj_data %>%
separate(data = final_proj_data, col = County,
into = c("State", "County"), sep = ",")
我不确定自己在做什么错,或者不确定“ col =”为何不断抛出此错误。任何帮助将不胜感激!
答案 0 :(得分:3)
使用dplyr
和基数R:
library(dplyr)
final_proj_data %>%
mutate(State=unlist(lapply(strsplit(County,", "),function(x) x[2])),
County=gsub(",.*","",County))
ID County Population Year State
1 1003 Baldwin County 169162 2006 Alabama
2 1015 Calhoun County 112903 2006 Alabama
3 1043 Cullman County 80187 2006 Alabama
4 1049 DeKalb County 68014 2006 Alabama
原始:
使用dplyr
和tidyr
(刚刚看到@Ronak Shah在上面发表了相同的评论):
library(dplyr)
library(tidyr)
final_proj_data %>%
separate(County,c("County","State"),sep=",")
ID County State Population Year
1 1003 Baldwin County Alabama 169162 2006
2 1015 Calhoun County Alabama 112903 2006
3 1043 Cullman County Alabama 80187 2006
4 1049 DeKalb County Alabama 68014 2006
答案 1 :(得分:2)
我们可以在此处尝试使用sub
作为基本的R选项:
County <- sub(",.*$", "", final_proj_data$County)
State <- sub("^.*,\\s*", "", final_proj_data$County)
final_proj_data$County <- County
final_proj_data$State <- State
答案 2 :(得分:2)
我们可以在base R
中使用read.csv
final_proj_data[c("County", "State")] <- read.csv(text = final_proj_data$County,
header = FALSE, stringsAsFactors = FALSE, strip.white = TRUE)
final_proj_data
# ID County Population Year State
#1 1003 Baldwin County 169162 2006 Alabama
#2 1015 Calhoun County 112903 2006 Alabama
#3 1043 Cullman County 80187 2006 Alabama
#4 1049 DeKalb County 68014 2006 Alabama
final_proj_data <- structure(list(ID = c(1003L, 1015L, 1043L, 1049L),
County = c("Baldwin County, Alabama",
"Calhoun County, Alabama", "Cullman County, Alabama", "DeKalb County, Alabama"
), Population = c(169162L, 112903L, 80187L, 68014L), Year = c(2006L,
2006L, 2006L, 2006L)), class = "data.frame", row.names = c(NA,
-4L))
答案 3 :(得分:1)
我们可以在底数R中使用strsplit
。
cbind(d, `colnames<-`(do.call(rbind, strsplit(d$County, ", ")), c("County", "State")))[-2]
# ID Population Year County State
# 1 1003 169162 2006 Baldwin County Alabama
# 2 1015 112903 2006 Calhoun County Alabama
# 3 1043 80187 2006 Cullman County Alabama
# 4 1049 68014 2006 DeKalb County Alabama
注意:如果strsplit(as.character(d$County), ", ")
是因子列,请使用County
。
数据
d <- structure(list(ID = c("1003", "1015", "1043", "1049"), County = c("Baldwin County, Alabama",
"Calhoun County, Alabama", "Cullman County, Alabama", "DeKalb County, Alabama"
), Population = c("169162", "112903", "80187", "68014"), Year = c("2006",
"2006", "2006", "2006")), row.names = c(NA, -4L), class = "data.frame")