我有两个数据框,其中一列名为'标题'在每个,包含字符串。我需要减少这些字符串以合并它们。现在我想在循环中使它尽可能干净,这样我只需要编写一次gsub-function。
我们说我有:
df_1 <-read.table(text="
id Title
1 some_average_title
2 another:_one
3 the_third!
4 and_'the'_last
",header=TRUE,sep="")
和
df_2 <-read.table(text="
id Title
1 some_average.title
2 another:one
3 the_third
4 and_the_last
",header=TRUE,sep="")
我现在要跑:
df_1$Title <- gsub(" |\\.|'|:|!|\\'|_", "", df_1$Title )
df_2$Title <- gsub(" |\\.|'|:|!|\\'|_", "", df_2$Title )
我尝试了以下循环:
for (dtfrm in c("dt_1", "df_2")) {
assign(paste0(dtfrm, "$Title"),
gsub(" |\\.|'|:|!|\\'|", "", get(paste0(dtfrm, "$Title")))
)
}
但它不起作用 - 尽管缺少错误信息。
我也在考虑lapply(list(dt_1, dt_2), function(w){ w$Title <- XXX })
,但我不知道要为XXX提供什么,因为gsub()
需要作为第三个参数的字符串列表。
答案 0 :(得分:1)
这有效:
for(df in c("df_1", "df_2")){
assign(df, transform(get(df), Title = gsub(" |\\.|'|:|!|\\'|_", "", Title)))
}
测试:
df_1
id Title
1 1 someaveragetitle
2 2 anotherone
3 3 thethird
4 4 andthelast
和
df_2
id Title
1 1 someaveragetitle
2 2 anotherone
3 3 thethird
4 4 andthelast
答案 1 :(得分:1)
介于@ David的评论和@Carlos的答案之间,还有一点额外的答案:
如果需要,使用mget
抓取data.frame
和list2env
复制原始data.frame
。
mget
+ lapply
将进行转换......
lapply(mget(ls(pattern = "df_\\d")), function(w)
transform(w, Title = gsub(" |\\.|'|:|!|\\'|_", "", Title)))
# $df_1
# id Title
# 1 1 someaveragetitle
# 2 2 anotherone
# 3 3 thethird
# 4 4 andthelast
#
# $df_2
# id Title
# 1 1 someaveragetitle
# 2 2 anotherone
# 3 3 thethird
# 4 4 andthelast
...但结果仍然是list
,并且不会影响原始data.frame
:
# df_1
# id Title
# 1 1 some_average_title
# 2 2 another:_one
# 3 3 the_third!
# 4 4 and_'the'_last
如果您确实要覆盖data.frame
,请尝试:
list2env(
lapply(mget(ls(pattern = "df_\\d")), function(w)
transform(w, Title = gsub(" |\\.|'|:|!|\\'|_", "", Title))),
envir = .GlobalEnv)
df_1
# id Title
# 1 1 someaveragetitle
# 2 2 anotherone
# 3 3 thethird
# 4 4 andthelast
答案 2 :(得分:0)
get()
将允许您以编程方式获取多个数据集
data.table()
将有助于轻松修改每个列中的列
## CREATING A FEW MORE DATA SETS
df_3 <- df_2
df_4 <- df_1
set.seed(1)
df_3$id <- sample(20, 4)
df_4$id <- sample(20, 4)
library(data.table)
dt_1 <- as.data.table(df_1)
dt_2 <- as.data.table(df_2)
dt_3 <- as.data.table(df_3)
dt_4 <- as.data.table(df_4)
## OR programatically:
Numb_of_DTs <- 4
names_of_dt_objects <- paste("dt", 1:Numb_of_DTs, sep="_") # dt_1, dt_2, etc
names_of_df_objects <- paste("df", 1:Numb_of_DTs, sep="_") # dt_1, dt_2, etc
for (i in 1:Numb_of_DTs)
assign(names_of_dt_objects[[i]], as.data.table(get(namse(names_of_df_objects[[i]]))))
for (dt.nm in names_of_dt_objects) {
get(dt.nm)[, Title := gsub("[ .':!_]", "", Title)]
## set the key for merging in the next step
setkey(get(dt.nm), Title)
## You might want to insert a line to clean up the column names, using
## setnames(get(dt.nm), OLD_NAMES, NEW_NAMES)
}
Reduce(merge, lapply(names_of_dt_objects, function(x) get(x)))