我有一个有2200行的数据集。我必须一次删除大量列(例如:大约400个)。此操作非常频繁地发生,并且要删除的列每次都会变化。要删除的列将位于文本文件中。
这就是我解决这个问题的方法。
#Reading data
myData = read.csv("myDataFile.csv")
#Getting the column names which should be deleted
colToDelete = read.table("columnsToBeRemoved.txt")
#processing the names list
tempList = as.character(unlist(colToDelete))
cat(paste(shQuote(tempList, type="cmd"), collapse=","))
newDataSet = subset(myData, select = - ??)
我使用cat(paste(shQuote(tempList, type="cmd"), collapse=","))
获取逗号分隔字符串中的名称列表。输出是
" 04_ic_1306"" 06_iEC042_1314"" 13_iEcDH1_1363"" 18_iEcHS_1320"" 26_iEcolC_1368"&# 34; 31_iEcSMS35_1347"" 33_iECs_1301"" 34_iECUMN_1333"" 36_iEKO11_1354"" 39_iJO1366"" 47_iZ_1308&#34 ;," 54_iSFxv_1172"
我已尝试过子集和data.table方法,但我没有运气使用其中任何一种方法。我收到以下错误。我没有将字符串指定为选择命令。
-a中的错误:一元运算符的无效参数
答案 0 :(得分:1)
b<- "04_ic_1306"
a[,paste(b)]<-NULL
现在要迭代地执行此操作,您可能必须编写循环并将文件名保存在数组中
[1] "04_ic_1306" "06_iEC042_1314" "13_iEcDH1_1363" "18_iEcHS_1320"
[5] "26_iEcolC_1368" "31_iEcSMS35_1347" "33_iECs_1301" "34_iECUMN_1333"
[9] "36_iEKO11_1354" "39_iJO1366" "47_iZ_1308" "54_iSFxv_1172"
答案 1 :(得分:1)
这可能是您的解决方案:
# Create data frame with 5 columns
df <- data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10), d=rnorm(10), e=rnorm(10))
# Select two columns to be removed
remove_col <- c("b", "d")
# Identify them in the column names
remove_col <- names(df) %in% remove_col
# Remove them using an inverse (the !) logical vector
df[,!remove_col]