如果列名中的最后几个字符符合某个条件,或者从一开始就调整当前代码,我需要帮助才能删除它们。
我正在处理有关Common Core评估的学生测试数据,而且列名不遵循一致的格式。数据框的结构如下:
>name(df)
[1] Student.ID
[2] State.ID
[3] A1_2.MD.A.1
[4] A1_2.MD.A.3
[5] A1_2.MD.A.4
[6] A1_2.MD.B.5
[7] A1_2.NBT.A.1
[8] A1_2.NBT.A.1.a
[9] A1_2.NBT.A.1.b
[10] A1_2.NBT.A.3
这是我想要的结果:
library(reshape2)
library(reshape)
library(stringr)
library(dplyr)
library(qdap)
for (column in c(3:ncol(df))) {
colnames(df)[column] <- substr(colnames(df[column],4,nchar(colnames(df)[column]))
}
## reduce column names to only the letter and number (strip the description)
for (column in c(3:ncol(df))) {
if (nchar(beg2char(colnames(df)[column],".")) < 3) {
colnames(df)[column] <- substr(colnames(df[column],1,8)
} else if (nchar(beg2char(colnames(df)[column],".")) > 2){
colnames(df)[column] <- substr(colnames(df)[column],1,9)
}
}
## add screening number indicator to start of percent scores
for (column in c(3:ncol(df))) {
colnames(df)[column] <- paste("A1_2", colnames(df)[column], sep=".")
}
这是我到目前为止的代码,但它只是让我的一部分:
>name(df)
[1] Student.ID
[2] State.ID
[3] A1_2.MD.A.1.S
[4] A1_2.MD.A.3.E
[5] A1_2.MD.A.4.M
[6] A1_2.MD.B.5.A
[7] A1_2.NBT.A.1.U
[8] A1_2.NBT.A.1.a
[9] A1_2.NBT.A.1.b
[10] A1_2.NBT.A.3.R
现在我得到了:
{{1}}
提前感谢您的帮助!
答案 0 :(得分:3)
您可以使用
names <- c(your_col_names_here)
names <- gsub("^X2\\.((?:[^.]+\\.){2}[^.]+(?:\\.[a-z])?).*",
"A1_2.\\1", names)
names(df) <- names
<小时/>
整个R
代码段:
# create a dummy df to test with
df <- as.data.frame(matrix(0, ncol = 10, nrow = 1))
names <- c("Student.ID", "State.ID",
"X2.MD.A.1.Select.and.Use.Appropriate.Tools.to.Measure.Length.Percent.Correct",
"X2.MD.A.3.Estimate.Length.Percent.Correct",
"X2.MD.A.4.Measurement.Difference.Percent.Correct",
"X2.MD.B.5.Addition.and.Subtraction.Word.Problems..Lengths.Percent.Correct",
"X2.NBT.A.1.Understand.Place.Value.Percent.Correct",
"X2.NBT.A.1.a.Understand.Place.Value..Bundles.of.Tens.Percent.Correct",
"X2.NBT.A.1.b.Understand.Place.Value..Bundles.of.Hundreds.Percent.Correct",
"X2.NBT.A.3.Read.and.Write.Numbers.to.1.000.Percent.Correct")
names(df) <- gsub(pattern = "^X2\\.((?:[^.]+\\.){2}[^.]+(?:\\.[a-z])?).*", "A1_2.\\1", names)
df