我在这个组织的数据库中有一个专栏:
示例:
Location
A_1
A_1
A_2
A_3
A_3
B_1
B_2
我想用第一部分(" A")将它们分组,使用R;也就是说,我想根据字母创建一个新列,因此数据库将如下所示:
Location Location_1
A_1 A
A_1 A
A_2 A
A_3 A
A_3 A
B_1 B
B_2 B
我已在此处的另一篇帖子(Create column with grouped values based on another column)后尝试了mutate()
和ifelse()
函数,但我收到此错误:
" UseMethod错误(" mutate _"):没有适用的方法来改变_' 应用于类"字符""
的对象
有人知道如何解决此问题或其他方法吗?
以下是我正在使用的.csv文件的一部分:
Location Species Time
A_1 FC 0.52
A_1 JC 0.64
A_2 JC 0.31
A_2 FC 0.02
A_2 FC 0.01
A_3 FC 0.13
A_3 JC 0.97
A_3 OT 0.86
A_3 JC 0.55
B_1 JC 0.32
B_1 OT 0.04
B_1 OT 0.06
B_2 OT 0.12
B_2 JC 0.13
B_2 JC 0.14
B_2 OT 0.56
C_1 OT 0.57
C_1 OT 0.86
C_1 FC 0.58
C_1 FC 0.76
... ... ...
答案 0 :(得分:1)
您可以使用strsplit
将第一列拆分为“_”。这应该做你想要的:
dat <- data.frame(Location=c("A_1","A_1","A_2","A_3","A_3","B_1","B_2"),
stringsAsFactors = FALSE)
dat$Location1 <- sapply(strsplit(dat$Location, "_"), "[[", 1)
dat
> dat
Location Location1
1 A_1 A
2 A_1 A
3 A_2 A
4 A_3 A
5 A_3 A
6 B_1 B
7 B_2 B
答案 1 :(得分:1)
使用gsub
或sub
在_
之前获取文字的方法很简单。它可以实现为:
#data
df <- data.frame(Location=c("A_1","A_1","A_2","A_3","A_3","B_1","B_2"),
State=c("S_1","S_1","S_2","T_3","T_3","T_1","T_2"),
City=c("X_1","X_1","X_2","X_3","X_3","Y_1","Y_2"),
stringsAsFactors = FALSE)
# single column
df$Location_1 <- gsub("_.*", "", df$Location, perl = TRUE)
df
# Location Location_1
#1 A_1 A
#2 A_1 A
#3 A_2 A
#4 A_3 A
#5 A_3 A
#6 B_1 B
#7 B_2 B
# using mutate_at for multiple columns. Its applying on all columns
library(dplyr)
df %>% mutate_at(names(df), .funs = funs(new = gsub("_.*", "", ., perl = TRUE)))
#Result
#Location State City Location_new State_new City_new
#1 A_1 S_1 X_1 A S X
#2 A_1 S_1 X_1 A S X
#3 A_2 S_2 X_2 A S X
#4 A_3 T_3 X_3 A T X
#5 A_3 T_3 X_3 A T X
#6 B_1 T_1 Y_1 B T Y
#7 B_2 T_2 Y_2 B T Y
选项3
从csv文件中读取:
df <- read.table("d:/Files/data.csv", header = TRUE, stringsAsFactors = FALSE)
df$Location_1 <- gsub("_.*", "", df$Location, perl = TRUE)