我似乎无法在其他帖子中找到我需要的东西,基本上,
许多其他帖子似乎正在将所有类型的类更改为另一种类:
Change the class of many columns in a data frame
Convert column classes in data.table
我相信我的问题是不同的,因为没有"将所有因素都改为字符"每列都有一个特定的类,我必须提前更改。
我在名为selectColumns的向量中有我的列名,我将其传递给fread。
selectColumns <- c(giantListofColumnsGoesHere)
DT <- fread("DT.csv", select=selectColumns, na.strings=NAsList)
setcolorder(DT, selectColumns)
colClasses <- list('character','character','character','factor','numeric','character','numeric','integer','integer','integer','integer','numeric','numeric','factor','factor','factor','logical','integer','numeric','factor','integer','integer','integer','factor','factor','factor','factor','factor','integer','integer','factor','integer','factor','factor','integer','factor','numeric','factor','numeric','character','factor','factor','factor','factor','factor','factor','factor','factor','factor','factor','integer','factor','numeric','factor','factor','character','factor','factor','factor','integer','numeric','integer','integer','integer','integer','integer','factor','character','factor','factor','factor','factor','integer','factor','factor','character','integer','integer','integer','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical')
#Now the part I can't figure out, I've tried:
lapply(DT, class) <- colClasses
#OR
attr(DT, class) <- colClasses
#Obviously attr(DT, class) just gives "data.table" "data.frame"
但是我需要对DT的列属性进行子集以某种方式获得较低级别的列表,但我对列表并不是很好,我似乎无法解决这个问题。我很抱歉,如果这个问题太简单并且已经得到了基本的回答,但我已经迷失了,似乎通常有一种简单的方法可以做到这一点。
抱歉,我无法提供数据,因为它包含私人信息。
感谢大家的帮助。
答案 0 :(得分:3)
假设OP忘记在pairs = []
for col1 in df1.columns:
for col2 in df2.columns:
if df1[col1].equals(df2[col2]):
pairs.append((col1, col2))
output = pandas.DataFrame(pairs, columns=['col1', 'col2'])
内使用colClasses
,或者在使用该操作时遇到任何技术问题,并且想要更改fread
的{{1}},请使用class
将是一个选项
data.table
请注意,“selectColumns”的初始set
为
for(j in seq_along(selectColumns)){
set(DT, i= NULL, j=selectColumns[j], value = get(colClasses[j])(DT[[selectColumns[j]]]))
}
str(DT)
#Classes ‘data.table’ and 'data.frame': 5 obs. of 6 variables:
#$ V1: num 1 2 3 4 5
#$ V2: chr "A" "B" "C" "D" ...
#$ V3: int 1 2 3 4 5
#$ V4: chr "F" "G" "H" "I" ...
#$ V5: chr "G" "H" "I" "J" ...
#$ V6: Factor w/ 5 levels "6","7","8","9",..: 1 2 3 4 5
class
注意:添加str(DT)
#Classes ‘data.table’ and 'data.frame': 5 obs. of 6 variables:
#$ V1: int 1 2 3 4 5
#$ V2: chr "A" "B" "C" "D" ...
#$ V3: num 1 2 3 4 5
#$ V4: chr "F" "G" "H" "I" ...
#$ V5: chr "G" "H" "I" "J" ...
#$ V6: int 6 7 8 9 10
到“colClasses”向量以进行转换。如果我们将'factor'转换为'numeric',那么我们必须分两步完成,即首先转换为'character'然后转换为'numeric'(基于@ Frank在评论中的建议)