我对我正在使用的数据集有一个小问题。假设我使用来自mergedData2
子集的命令行定义了一个名为mergedData
的数据集:
mergedData=rbind(test_set,training_set)
lookformean<-grep("mean()",names(mergedData),fixed=TRUE)
lookforstd<-grep("std()",names(mergedData),fixed=TRUE)
varsofinterests<-sort(c(lookformean,lookforstd))
mergedData2<-mergedData[,c(1:2,varsofinterests)]
如果我names(mergedData2)
,我得到:
[1] "volunteer_identifier" "type_of_experiment"
[3] "body_acceleration_mean()-X" "body_acceleration_mean()-Y"
[5] "body_acceleration_mean()-Z" "body_acceleration_std()-X"
(我把这6个名字作为MWE,但我有68个名字的矢量)
现在,假设我想要根据volunteer_identifier
和type_of_experiment
计算每个测量值的平均值。为此,我使用了split
和lapply
的组合:
mylist<-split(mergedData2,list(mergedData2$volunteer_identifier,mergedData2$type_of_experiment))
average_activities<-lapply(mylist,function(x) colMeans(x))
average_dataset<-t(as.data.frame(average_activities))
由于average_activities
是一个列表,我将其转换为数据框并转换此数据框以保持与mergedData
和mergedData2
相同的格式。现在的问题如下:当我呼叫names(average_dataset)
时,它返回NULL
!!但是,更奇怪的是,当我这样做时:head(average_dataset)
;它返回:
volunteer_identifier type_of_experiment body_acceleration_mean()-X body_acceleration_mean()-Y
1 1 0.2773308 -0.01738382
2 1 0.2764266 -0.01859492
3 1 0.2755675 -0.01717678
4 1 0.2785820 -0.01483995
5 1 0.2778423 -0.01728503
6 1 0.2836589 -0.01689542
这只是输出的一小部分,说变量的名称就在那里。那么为什么names(average_dataset)
会返回NULL
?
提前感谢您的回复,最好
编辑:这是mergedData2的MWE:
volunteer_identifier type_of_experiment body_acceleration_mean()-X body_acceleration_mean()-Y
1 2 5 0.2571778 -0.02328523
2 2 5 0.2860267 -0.01316336
3 2 5 0.2754848 -0.02605042
4 2 5 0.2702982 -0.03261387
5 2 5 0.2748330 -0.02784779
6 2 5 0.2792199 -0.01862040
body_acceleration_mean()-Z body_acceleration_std()-X body_acceleration_std()-Y body_acceleration_std()-Z
1 -0.01465376 -0.9384040 -0.9200908 -0.6676833
2 -0.11908252 -0.9754147 -0.9674579 -0.9449582
3 -0.11815167 -0.9938190 -0.9699255 -0.9627480
4 -0.11752018 -0.9947428 -0.9732676 -0.9670907
5 -0.12952716 -0.9938525 -0.9674455 -0.9782950
6 -0.11390197 -0.9944552 -0.9704169 -0.9653163
gravity_acceleration_mean()-X gravity_acceleration_mean()-Y gravity_acceleration_mean()-Z
1 0.9364893 -0.2827192 0.1152882
2 0.9274036 -0.2892151 0.1525683
3 0.9299150 -0.2875128 0.1460856
4 0.9288814 -0.2933958 0.1429259
5 0.9265997 -0.3029609 0.1383067
6 0.9256632 -0.3089397 0.1305608
gravity_acceleration_std()-X gravity_acceleration_std()-Y gravity_acceleration_std()-Z
1 -0.9254273 -0.9370141 -0.5642884
2 -0.9890571 -0.9838872 -0.9647811
3 -0.9959365 -0.9882505 -0.9815796
4 -0.9931392 -0.9704192 -0.9915917
5 -0.9955746 -0.9709604 -0.9680853
6 -0.9988423 -0.9907387 -0.9712319
我的职责是得到这个average_dataset(这是一个数据集,其中包含每个志愿者和实验类型的每个物理量(第3列及以后)的平均值(例如1 1 mean1 mean2 mean3 ... mean68 2 1 mean1 mean2 mean3 ... mean68等)
在此之后我将把它作为txt文件导出(所以我认为使用write.table与row.names = F,col.names = T)。请注意,目前,如果我这样做并导入使用read.table生成的数据集,我不会恢复数据集的列名称;甚至在指定col.names = T时。