Question

我一直在使用UCI Machine Learning Repository的数据集。某些数据集（如this one）包含a file，其扩展名为.c45-names，看起来机器可读。

有没有办法使用这些数据自动命名数据框中的列，甚至更好地使用其他元数据，如数据类型或离散变量的可能值？

目前，我将列名复制/粘贴到一行代码中，如下所示：

names(cars) = c('buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'rating')

如果有更自动化的东西会很好，到目前为止谷歌搜索效果不佳，因为有一个类似命名的分类算法已在R中实现。

Answer 1

car.c45_names <- readLines("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.c45-names")
tmp <- car.c45_names[grep(":", car.c45_names)] #grab lines containing ":"
colname_car.c45 <- sub(':.*', '', tmp) #replace all characters after ":" with ""; thanks to alistaire's for pointing out     
# colname_car.c45 <- sapply(tmp, function(x)substring(x, 1, gregexpr(":", x)[[1]]-1)) 
cars <- setNames(cars, colname_car.c45) #same as 'names(cars) <- colname_car.c45'

从名称文件中自动导入R中的列名

1 个答案: