我正在处理数据帧列表。在每个数据帧中,我想用前导零填充单个ID变量。 ID变量是字符向量,并且始终是数据帧中的第一个变量。但是,在每个数据帧中,ID变量的长度都不同。例如:
df1_id的范围是1:20,因此我需要填充最多一个零, df2_id的范围是1:100,因此我需要填充最多两个零, 等
我的问题是,如何填充每个数据帧而不必为列表中的每个数据帧编写一行代码。
如上所述,我可以通过在每个数据帧上分别使用str_pad函数来解决此问题。例如,请参见下面的代码:
#Load stringr package
library(stringr)
#Create sample data frames
df1 <- data.frame("x" = as.character(1:20), "y" = rnorm(20, 10, 1),
stringsAsFactors = FALSE)
df2 <- data.frame("v" = as.character(1:100), "y" = rnorm(100, 10, 1),
stringsAsFactors = FALSE)
df3 <- data.frame("z" = as.character(1:1000), "y" = rnorm(1000, 10, 1),
stringsAsFactors = FALSE)
#Combine data fames into list
dfl <- list(df1, df2, df3)
#Pad ID variables with leading zeros
dfl[[1]]$x <- str_pad(dfl[[1]]$x, width = 2, pad = "0")
dfl[[2]]$v <- str_pad(dfl[[2]]$v, width = 3, pad = "0")
dfl[[3]]$z <- str_pad(dfl[[3]]$z, width = 4, pad = "0")
虽然此解决方案对于较短的列表比较有效,但是随着数据帧数量的增加,它变得有些笨拙。
如果有一种方法可以将某种“序列”矢量嵌入到str_pad函数的width参数中,我会很喜欢。像这样:
dfl <- lapply(dfl, function(x) {x[,1] <- str_pad(x[,1], width = SEQ, pad =
"0")})
其中SEQ是可变长度的向量。使用上面的示例,它看起来像:
seq <- c(2,3,4)
预先感谢,如果您有任何疑问,请告诉我。
〜kj
答案 0 :(得分:0)
You could use Map
here, which is designed to apply a function "to the first elements of each ...
argument, the second elements, the third elements", see ?mapply
for details.
library(stringr)
vec <- c(2,3,4) # this is the vector of 'widths', don't name it seq
Map(function(i, y) {
dfl[[i]][, 1] <- str_pad(dfl[[i]][, 1], width = y, pad = "0")
dfl[[i]] # this gets returned
},
# you iterate over these two vectors in parallel
i = 1:length(dfl),
y = vec)
Output
#[[1]]
# x y
#1 01 9.373546
#2 02 10.183643
#3 03 9.164371
#
#[[2]]
# v y
#1 001 11.595281
#2 002 10.329508
#3 003 9.179532
#4 004 10.487429
#
#[[3]]
# z y
#1 0001 10.738325
#2 0002 10.575781
#3 0003 9.694612
#4 0004 11.511781
#5 0005 10.389843
explanation
The function that we pass to Map
is an anonymous function, which more or less you provided in your question:
function(i, y) {
dfl[[i]][, 1] <- str_pad(dfl[[i]][, 1], width = y, pad = "0")
dfl[[i]] # this gets returned
}
You see the function takes two argument, i
and y
(choose other names if you like such as df
and width
), and for each dataframe in your list it modifies the first column dfl[[i]][, 1] <- ...
. What the anonymous function does is it applies str_pad
to the first column of each dataframe
... <- str_pad(dfl[[i]][, 1], width = y, pad = "0")
but you see that we don't pass a fixed value to the width
argument, but y
.
Coming back to Map
. Map
now applies str_pad
to the first dataframe, with argument width = 2
, it applies str_pad
to the second dataframe, with argument width = 3
and - you probably guessed it - it applies str_pad
to the third dataframe in your list, with argument width = 4
.
The arguments are specified in the last two lines of the code as
i = 1:length(dfl),
y = vec)
I hope this helps.
data
(consider to create a minimal example next time as the number of rows of the dataframes is not relevant for the problem)
set.seed(1)
df1 <- data.frame("x" = as.character(1:3), "y" = rnorm(3, 10, 1),
stringsAsFactors = FALSE)
df2 <- data.frame("v" = as.character(1:4), "y" = rnorm(4, 10, 1),
stringsAsFactors = FALSE)
df3 <- data.frame("z" = as.character(1:5), "y" = rnorm(5, 10, 1),
stringsAsFactors = FALSE)
#Combine data fames into list
dfl <- list(df1, df2, df3)