下面是一个称为function
的{{1}},但仅在特定的数据帧名称上有效。简而言之,我在理解如何操作change_names
函数以使其可以处理不同的数据帧名称方面遇到问题。
当我在assign
中读取文件时,function
基本上会更改文件列上的名称。例如,一个文件可能具有列名称for loop
,应为'A'
,而另一个文件可能具有列名称'X'
,其列名称也应为'D'
。
我尝试了几种不同的方法来实际更改原始数据帧'X'
,但是我需要能够在其他数据帧上使用该功能。
'tempPullList'
再次,现在我只能在数据帧称为“ tempPullList”时执行此操作,我需要能够对另一个数据帧进行操作。
我在编写函数,尤其是在函数内分配变量方面还很陌生。我希望此功能尽可能多变。我目前正在努力将#====example different files====
file1 <- data.frame(A = rep(1:10), Y = rep(c("Yellow","Red","Purpule","Green","Blue"), 2),
Z = rep(c("Drink", "Food"), 5))
file2 <- data.frame(D = rep(1:10), B = rep(c("Brown","Pink","Purpule","Green","Blue"), 2),
Z = rep(c("Drink", "Food"), 5))
file3 <- data.frame(X = rep(1:10), B = rep(c("Brown","Pink","Purpule","Green","Blue"), 2),
C = rep(c("Drink", "Food"), 5))
file_list <- list(file1, file2, file3)
#====Package Bank====
library(data.table)
library(dplyr)
#====Function====
change_names <- function(x){
#a list of columns to be renamed
#through out the files
chgCols <- c("A",
"B",
"C",
"D")
#the names the columns will be changed to
namekey <- c(A = "X",
B = "Y",
C = "Z",
D = "X")
chgCols <- match(chgCols, colnames(x)) #find any unwanted column indexes in data frame
chgCols <- colnames(x[, chgCols[!is.na(chgCols)]]) #match indexes to column names w/o NA's
x <- x %>% #rename associated columns
plyr::rename(namekey[chgCols]) #from 'namekey' in dataframe
assign('tempPullList', x, envir = .GlobalEnv)
}
#====Read in Files====
PullList <- data.frame()
for(file in 1:length(file_list)){
tempPullList <- data.frame(file_list[file])
print(file)
change_names(x = tempPullList)
PullList <- rbindlist(list(PullList, tempPullList),
fill = T)
}
和chgCols
用作输入。因此,对此的任何建议也将有所帮助
答案 0 :(得分:0)
示例数据:
column_name_lookup <- data.frame(orig = c("a","b","c","d"),
new = c("X","Y","z","X"),
stringsAsFactors = FALSE)
test_df <- data.frame(a = 1:5,
c = 2:6,
b = 3:7,
e = 4:8,
d = 5:9)
a c b e d 1 1 2 3 4 5 2 2 3 4 5 6 3 3 4 5 6 7 4 4 5 6 7 8 5 5 6 7 8 9
更改名称的代码:
new_names <- column_name_lookup$new[match(names(test_df),column_name_lookup$orig)]
names(test_df) <- ifelse(is.na(new_names),names(test_df),new_names)
X z Y e X 1 1 2 3 4 5 2 2 3 4 5 6 3 3 4 5 6 7 4 4 5 6 7 8 5 5 6 7 8 9