在嵌套的小标题中设置行名并执行kmeans聚类

时间:2019-04-25 20:46:51

标签: r

我试图将嵌套的小标题列设置为行名并执行kmeans模型,但不确定如何继续进行此部分。

数据如下:

    value_1    year_1   value_2    year_2            id_key
1 0.8572629 2006_2007 0.8352446 2006_2007   2006_2007_21267
2 0.9955628 2017_2018 0.9851993 2017_2018 2017_2018_1111711
3 0.9336878 2012_2013 0.9865080 2012_2013 2012_2013_1140536
4 0.8965862 2017_2018 0.9877127 2017_2018  2017_2018_832988
5 0.9659160 2012_2013 0.9855530 2012_2013  2012_2013_715096
6 0.9788319 2012_2013 0.5560681 2012_2013  2012_2013_875045

我应用以下代码(其中x是下面的数据):

kclust <- x %>%
  as_tibble() %>% 
  group_by(year_1, year_2) %>%
  nest(.key = "value")
kclust

哪个给我这个输出:

# A tibble: 13 x 3
   year_1    year_2    value            
   <chr>     <chr>     <list>           
 1 2006_2007 2006_2007 <tibble [3 x 3]> 
 2 2017_2018 2017_2018 <tibble [11 x 3]>
 3 2012_2013 2012_2013 <tibble [11 x 3]>
 4 2010_2011 2010_2011 <tibble [11 x 3]>
 5 2014_2015 2014_2015 <tibble [12 x 3]>
 6 2011_2012 2011_2012 <tibble [9 x 3]> 
 7 2013_2014 2013_2014 <tibble [11 x 3]>
 8 2016_2017 2016_2017 <tibble [7 x 3]> 
 9 2009_2010 2009_2010 <tibble [6 x 3]> 
10 2008_2009 2008_2009 <tibble [3 x 3]> 
11 2007_2008 2007_2008 <tibble [7 x 3]> 
12 2015_2016 2015_2016 <tibble [5 x 3]> 
13 2018_2019 2018_2019 <tibble [4 x 3]>

检查kclust$value给我:

[[12]]
# A tibble: 5 x 3
  value_1 value_2 id_key           
    <dbl>   <dbl> <chr>            
1   0.943   0.887 2015_2016_1024478
2   0.861   0.571 2015_2016_816284 
3   0.759   0.959 2015_2016_1260221
4   0.756   0.921 2015_2016_101829 
5   0.981   0.936 2015_2016_709519 

[[13]]
# A tibble: 4 x 3
  value_1 value_2 id_key           
    <dbl>   <dbl> <chr>            
1   0.927   0.959 2018_2019_6201   
2   0.888   0.950 2018_2019_1274494
3   0.962   0.995 2018_2019_1011657
4   0.982   0.921 2018_2019_78814

我想在其中的每一个中将id_key设置为行名。因此,每个小标题的行名均为id_key,而kmeans模型将在列value_1value_2上执行。

我当前拥有的代码如下:

k_means_centers = 2

kclust <- x %>%
  as_tibble() %>% 
  group_by(year_1, year_2) %>%
  nest(.key = "value") %>%
  filter(map_int(value, nrow) > 4) %>%
  mutate(kmeans = map(value, ~kmeans(.x[[1]], 
                                     centers = k_means_centers, iter.max = 10, nstart = 1)),
         tidied = map(kmeans, tidy),
         glanced = map(kmeans, glance), 
         augmented = map2(kmeans, value, augment))

但是这是错误的,因为应该将列id_key设置为行名。

关于前进的任何想法都很棒!

数据:

x <- structure(list(value_1 = c(0.857262918412708, 0.995562776151855, 
0.93368775296229, 0.896586197519892, 0.965915992432594, 0.978831872921186, 
0.931391986938977, 0.92860612171699, 0.942462944742556, 0.762633664061804, 
0.929314203239609, 0.857555211754759, 0.942672735583934, 0.975237093000455, 
0.472863198177383, 0.83842400391849, 0.526669740477171, 0.952190151229782, 
0.519623395661802, 0.981457763792911, 0.91428464980769, 0.954400181141033, 
0.840051106034647, 0.867699181854421, 0.89115631348807, 0.729514655086613, 
0.659568217442908, 0.955200325383147, 0.88820423579156, 0.777402590109491, 
0.943172514612716, 0.944933061146504, 0.476284928268558, 0.946901135343463, 
0.780230224813699, 0.909629399821505, 0.865760792222491, 0.773621382484436, 
0.836554542942252, 0.850980529158788, 0.527814114655505, 0.90791799831592, 
0.882265024462087, 0.952685269154299, 0.891211744870977, 0.976456274127145, 
0.90126924436977, 0.969111672067112, 1, 1, 0.968113370208161, 
0.916126244980983, 0.933883953373501, 0.980900126347656, 0.924480004726964, 
0.967149874304775, 1, 0.933612247290514, 0.982568394222027, 0.987764537202365, 
0.898088994752593, 0.943973029406048, 0.926659428845797, 0.982663249602368, 
0.0116889359524374, 0.985030805938289, 0.888240767289578, 0.779528122930639, 
0.99485244406698, 0.82816655776856, 0.861023758791347, 1, 0.664694109407606, 
0.960818825051411, 0.75945031696856, 0.763886276158968, 0.835629553075541, 
0.846110310875411, 0.755847711756697, 0.67196568780797, 0.961888544525641, 
0.969861418360086, 1, 0.974427258663929, 0.831055915618247, 0.93600722049079, 
0.966106024456998, 0.901338903666627, 0.683877040965164, 0.979457749060513, 
0.943143340661096, 1, 0.98087163513475, 0.769732004988478, 0.968733750777874, 
0.937393276158036, 0.982135855939805, 1, 0.00987183002576516, 
0.764641730481701), year_1 = c("2006_2007", "2017_2018", "2012_2013", 
"2017_2018", "2012_2013", "2012_2013", "2012_2013", "2010_2011", 
"2014_2015", "2011_2012", "2013_2014", "2016_2017", "2013_2014", 
"2011_2012", "2011_2012", "2013_2014", "2009_2010", "2009_2010", 
"2013_2014", "2010_2011", "2016_2017", "2016_2017", "2017_2018", 
"2013_2014", "2014_2015", "2014_2015", "2017_2018", "2008_2009", 
"2007_2008", "2012_2013", "2015_2016", "2017_2018", "2007_2008", 
"2012_2013", "2014_2015", "2009_2010", "2006_2007", "2010_2011", 
"2010_2011", "2014_2015", "2012_2013", "2017_2018", "2006_2007", 
"2013_2014", "2009_2010", "2007_2008", "2012_2013", "2010_2011", 
"2014_2015", "2017_2018", "2017_2018", "2011_2012", "2013_2014", 
"2016_2017", "2013_2014", "2014_2015", "2011_2012", "2013_2014", 
"2010_2011", "2012_2013", "2007_2008", "2017_2018", "2018_2019", 
"2013_2014", "2012_2013", "2014_2015", "2018_2019", "2009_2010", 
"2008_2009", "2010_2011", "2015_2016", "2010_2011", "2011_2012", 
"2007_2008", "2015_2016", "2008_2009", "2011_2012", "2013_2014", 
"2015_2016", "2010_2011", "2018_2019", "2016_2017", "2016_2017", 
"2011_2012", "2016_2017", "2010_2011", "2017_2018", "2014_2015", 
"2007_2008", "2014_2015", "2009_2010", "2017_2018", "2015_2016", 
"2010_2011", "2014_2015", "2012_2013", "2018_2019", "2007_2008", 
"2011_2012", "2014_2015"), value_2 = c(0.83524458245376, 0.985199346676161, 
0.98650800423171, 0.987712680219121, 0.985552973109259, 0.55606807703455, 
0.993081550565629, 0.942324451054759, 0.874951001978959, 0.972242235849801, 
0.960561835073607, 0.745948805820105, 0.797055662541724, 0.977508894088148, 
0.712233681864871, 0.285060053385682, 0.905730331400375, 0.93571084346821, 
0.790305033705714, 0.958722926473936, 0.962776635511766, 0.992608325470545, 
0.474283965476535, 0.806366773701265, 0.904730345643149, 0.862254279087857, 
0.984488707157245, 0.892241046229236, 0.714442964628943, 0.807622124741829, 
0.887170731681905, 0.954684589806249, 0.9211778417945, 0.948974567771373, 
0.965125469708914, 0.886108424878785, 0.942065878654209, 0.66663307765255, 
0.90331177434957, 0.976829922293502, 0.95848533971269, 0.956127315051688, 
0.650750852737616, 0.9999724828739, 0.826005013210071, 0.959980346940766, 
0.978304048122191, 0.975422514331076, 0.792199553496305, 0.461104040127036, 
0.997962170857627, 0.968897881428091, 0.820571356084491, 0.99183854174536, 
0.937073215517585, 0.993271661681666, 0.862602069969553, 0.941823773454386, 
0.984268864412331, 0.983876968226894, 0.760177556170661, 0.926285514876429, 
0.959350334184441, 0.996409752091077, 0.0403289662596409, 0.994749999506845, 
0.950154051313514, 0.916520797550305, 0.728271849187279, 0.89835975825379, 
0.571018894293857, 0.971731331958454, 0.810499095029711, 0.887497351434693, 
0.958925181726699, 0.893189038016587, 0.875143741543741, 0.833284214217249, 
0.921240338805686, 0.926586130283117, 0.994798572238072, 0.980971763292719, 
0.964016005572769, 0.989376580856801, 0.935519257737914, 0.922845605574439, 
0.996381524259124, 0.0351359902186695, 0.953643584869029, 0.937802352885434, 
0.902249244386311, 0.719887783612443, 0.936028294902931, 0.809292272844584, 
0.974049350800454, 0.781649033147858, 0.920733566350649, 0.998781417653825, 
0.0617975853732401, 0.883026179946989), year_2 = c("2006_2007", 
"2017_2018", "2012_2013", "2017_2018", "2012_2013", "2012_2013", 
"2012_2013", "2010_2011", "2014_2015", "2011_2012", "2013_2014", 
"2016_2017", "2013_2014", "2011_2012", "2011_2012", "2013_2014", 
"2009_2010", "2009_2010", "2013_2014", "2010_2011", "2016_2017", 
"2016_2017", "2017_2018", "2013_2014", "2014_2015", "2014_2015", 
"2017_2018", "2008_2009", "2007_2008", "2012_2013", "2015_2016", 
"2017_2018", "2007_2008", "2012_2013", "2014_2015", "2009_2010", 
"2006_2007", "2010_2011", "2010_2011", "2014_2015", "2012_2013", 
"2017_2018", "2006_2007", "2013_2014", "2009_2010", "2007_2008", 
"2012_2013", "2010_2011", "2014_2015", "2017_2018", "2017_2018", 
"2011_2012", "2013_2014", "2016_2017", "2013_2014", "2014_2015", 
"2011_2012", "2013_2014", "2010_2011", "2012_2013", "2007_2008", 
"2017_2018", "2018_2019", "2013_2014", "2012_2013", "2014_2015", 
"2018_2019", "2009_2010", "2008_2009", "2010_2011", "2015_2016", 
"2010_2011", "2011_2012", "2007_2008", "2015_2016", "2008_2009", 
"2011_2012", "2013_2014", "2015_2016", "2010_2011", "2018_2019", 
"2016_2017", "2016_2017", "2011_2012", "2016_2017", "2010_2011", 
"2017_2018", "2014_2015", "2007_2008", "2014_2015", "2009_2010", 
"2017_2018", "2015_2016", "2010_2011", "2014_2015", "2012_2013", 
"2018_2019", "2007_2008", "2011_2012", "2014_2015"), id_key = c("2006_2007_21267", 
"2017_2018_1111711", "2012_2013_1140536", "2017_2018_832988", 
"2012_2013_715096", "2012_2013_875045", "2012_2013_891024", "2010_2011_815556", 
"2014_2015_39911", "2011_2012_1123360", "2013_2014_916365", "2016_2017_26172", 
"2013_2014_732485", "2011_2012_1551152", "2011_2012_1095073", 
"2013_2014_709804", "2009_2010_65100", "2009_2010_1018963", "2013_2014_20388", 
"2010_2011_1115222", "2016_2017_1646383", "2016_2017_1567892", 
"2017_2018_1368007", "2013_2014_205520", "2014_2015_851968", 
"2014_2015_46989", "2017_2018_1116521", "2008_2009_23217", "2007_2008_79879", 
"2012_2013_709804", "2015_2016_1024478", "2017_2018_1062379", 
"2007_2008_877890", "2012_2013_51396", "2014_2015_1064728", "2009_2010_1026214", 
"2006_2007_58492", "2010_2011_820027", "2010_2011_2488", "2014_2015_1458891", 
"2012_2013_40545", "2017_2018_1326801", "2006_2007_36270", "2013_2014_1140536", 
"2009_2010_1396009", "2007_2008_42582", "2012_2013_1637459", 
"2010_2011_794323", "2014_2015_4904", "2017_2018_701221", "2017_2018_934612", 
"2011_2012_47111", "2013_2014_352947", "2016_2017_1613103", "2013_2014_34408", 
"2014_2015_890801", "2011_2012_875570", "2013_2014_812074", "2010_2011_1466258", 
"2012_2013_723612", "2007_2008_5513", "2017_2018_30625", "2018_2019_6201", 
"2013_2014_1024305", "2012_2013_1466258", "2014_2015_814453", 
"2018_2019_1274494", "2009_2010_2488", "2008_2009_106640", "2010_2011_33213", 
"2015_2016_816284", "2010_2011_1267238", "2011_2012_1451505", 
"2007_2008_38777", "2015_2016_1260221", "2008_2009_1001082", 
"2011_2012_817473", "2013_2014_1274057", "2015_2016_101829", 
"2010_2011_1020569", "2018_2019_1011657", "2016_2017_789388", 
"2016_2017_1004434", "2011_2012_1156039", "2016_2017_1350031", 
"2010_2011_205520", "2017_2018_42582", "2014_2015_812074", "2007_2008_1039684", 
"2014_2015_751652", "2009_2010_1047699", "2017_2018_101829", 
"2015_2016_709519", "2010_2011_861878", "2014_2015_832428", "2012_2013_74208", 
"2018_2019_78814", "2007_2008_922224", "2011_2012_314808", "2014_2015_3673"
)), row.names = c(NA, -100L), class = "data.frame")

编辑:

我尝试rownames_to_column()时运气不佳。

0 个答案:

没有答案