有没有办法在R中识别无效的列名?也许使用正则表达式或其他技术。
我正在从文本列生成DocumentTermMatrix(DTM),然后将此DTM转换为数据框。我最终得到了名称无效的列。 e.g。
“node”“CLASS”“️️️️”“️️️”“de”“des”“je devais”“夜”“eyeshereyes”“cpas chaud”“郁郁葱葱的cosmétiques”“ 我看到了“
当我将此数据集传递给mlr :: makeClassificationTask时,我收到以下错误消息
makeClassifTask中的错误(data = dat,target =“CLASS”): 断言'数据'失败:列必须根据R的变量命名规则命名。
因此,我想识别并删除所有具有无效名称的列。
之类的东西invalidColumnNames <- identify indexes of columns with invalid names
dat <- dat[,-invalidColumnNames]
REPRODUCIBLE示例的数据:
cols <- c("node", "CLASS", "️️️️", "️️️", " de", " des",
" kmh", " points", " zéro", "\u2615️\u2615️", "\u2615️",
"\u2693️\u2693️", "\u26f5️\u2693️", "\u2728\u2728\u2728\u2728\u2728",
"aaliassime", "aaron", "abaixoassinado", "abandono", "abat",
"abattu", "abiertamente", "abierto", "abit", "able", "abomination",
"abonnements", "abonnés", "abonnez", "abraham", "absolutely",
"abstract", "abused", "acaba", "acabar", "acabo", "acadiebathurst",
"acaï", "acc", "accept", "accèsloisirs", "access", "accessible",
"accessories", "accident", "accidentally", "acción", "acciones",
"accommodationsreligious", "accompli", "accomplie", "accomplir",
"accorde", "accordent", "account", "accounts", "accro", "accueil",
"accueille", "accueillir", "accurate", "accusé", "accusent",
"acérées", "acériculteur", "acha", "achat", "achei", "acheté",
"acheter", "acho", "acidités", "acknowledge", "acontecem", "acordei",
"acquis", "across", "action", "activité", "activités", "actresses",
"actualité", "actuel", "adam", "adaptation", "adapter", "added",
"addicive", "addicted", "addition", "additives", "addressed",
"adds", "adeus", "adjoint", "adjointeadministrative", "adjust",
"administratives", "adopción", "adopté", "adorable")
期望的结果:
"node", "CLASS", " de", " des",
" kmh", " points", " zéro", "aaliassime", "aaron",
"abaixoassinado", "abandono", "abat",
"abattu", "abiertamente", "abierto", "abit", "able", "abomination",
"abonnements", "abonnés", "abonnez", "abraham", "absolutely",
"abstract", "abused", "acaba", "acabar", "acabo", "acadiebathurst",
"acaï", "acc", "accept", "accèsloisirs", "access", "accessible",
"accessories", "accident", "accidentally", "acción", "acciones",
"accommodationsreligious", "accompli", "accomplie", "accomplir",
"accorde", "accordent", "account", "accounts", "accro", "accueil",
"accueille", "accueillir", "accurate", "accusé", "accusent",
"acérées", "acériculteur", "acha", "achat", "achei", "acheté",
"acheter", "acho", "acidités", "acknowledge", "acontecem", "acordei",
"acquis", "across", "action", "activité", "activités", "actresses",
"actualité", "actuel", "adam", "adaptation", "adapter", "added",
"addicive", "addicted", "addition", "additives", "addressed",
"adds", "adeus", "adjoint", "adjointeadministrative", "adjust",
"administratives", "adopción", "adopté", "adorable"
非常感谢任何帮助。
答案 0 :(得分:1)
有关此类事项,请参阅?make.names
。我还删除了变量开头和结尾的空格,所以:
cols <- trimws(cols)
cols[make.names(cols)==cols]
# [1] "node" "CLASS" "de" "des"
# [5] "kmh" "points" "zéro" "aaliassime" ...
答案 1 :(得分:0)
也许你可以试试这个新包装:
library(janitor)
newdataobject <- read.csv("yourcsvfilewithpath.csv", header=T) %>% clean_names()