拆分()由2个变量而不是R中的1个?

时间:2017-08-03 21:47:52

标签: r dataframe split data-manipulation bigdata

我想知道是否可以使用split函数来组织2个变量而不仅仅是1个?

现在是代码。

holders <- split(z_combined_cost_dtrmnt, z_combined_cost_dtrmnt$val_lvl2 )
holders <- lapply(holders, function(x) x[!x$episode_count <= 3 | is.na(x$episode_count),])
holders <- lapply(holders, function(x){
                    x$prd_num_of_days_num <- remove_outliers(x$prd_num_of_days_num)
                    return(x) })

z_combined_cost_dtrmnt <- do.call(rbind, holders)
z_combined_cost_dtrmnt <-subset(z_combined_cost_dtrmnt, !is.na(z_combined_cost_dtrmnt$prd_num_of_days_num))

现在运行良好,但我刚刚了解到我实际上需要按val_lvl2和val_lvl3排序以获取我的数据的唯一值,然后才能继续进一步操作。所以我试图做的就是基本上

holders <- split(z_combined_cost_dtrmnt, z_combined_cost_dtrmnt$val_lvl2 & z_combined_cost_dtrmnt$val_lvl3 )

现在我的编译器现在还没有运行,但我想知道这是否可能以某种其他方式出现?

当前输出:

 Upper GI Endoscopy with Biopsy                                            :'data.frame':     292 obs. of  22 variables:
  ..$ mcp_cat_name                 : chr [1:292] "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" ...
  ..$ pln_name                     : chr [1:292] "AR" "AR" "AR" "AR" ...
  ..$ hosp_refl_rgn_name           : chr [1:292] "Fort Smith, AR" "Fort Smith, AR" "Jonesboro, AR" "Jonesboro, AR" ...
  ..$ val_lvl1                     : chr [1:292] "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" ...
  ..$ val_lvl2                     : chr [1:292] "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" ...
  ..$ val_lvl3                     : chr [1:292] "Outpatient Hospital" "Surgical Center" "Outpatient Hospital" "Surgical Center" ...

预期产出:

 Upper GI Endoscopy with Biopsy                                            :'data.frame':     146 obs. of  22 variables:
  ..$ mcp_cat_name                 : chr [1:146] "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" ...
  ..$ pln_name                     : chr [1:146] "AR" "AR" "AR" "AR" ...
  ..$ hosp_refl_rgn_name           : chr [1:146] "Fort Smith, AR" "Fort Smith, AR" "Jonesboro, AR" "Jonesboro, AR" ...
  ..$ val_lvl1                     : chr [1:146] "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" ...
  ..$ val_lvl2                     : chr [1:146] "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" ...
  ..$ val_lvl3                     : chr [1:146] "Outpatient Hospital" "Outpatient Hospital" "Outpatient Hospital" ...


Upper GI Endoscopy with Biopsy                                            :'data.frame':     146 obs. of  22 variables:
  ..$ mcp_cat_name                 : chr [1:146] "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" ...
  ..$ pln_name                     : chr [1:146] "AR" "AR" "AR" "AR" ...
  ..$ hosp_refl_rgn_name           : chr [1:146] "Fort Smith, AR" "Fort Smith, AR" "Jonesboro, AR" "Jonesboro, AR" ...
  ..$ val_lvl1                     : chr [1:146] "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" ...
  ..$ val_lvl2                     : chr [1:146] "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" ...
  ..$ val_lvl3                     : chr [1:146] "Surgical Center" "Surgical Center" "Surgical Center" "Surgical Center" ...

示例数据: 这是使用以下代码创建的... dput(head(z_combined_cost_dtrmnt,50))

dput(head (z_combined_cost_dtrmnt, 50))
structure(list(mcp_cat_name = c("Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions"
), pln_name = c("AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR",
"AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR",
"AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR",
"CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA",
"CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA"), hosp_refl_rgn_name = c("Fort Smith, AR",
"Fort Smith, AR", "Fort Smith, AR", "Fort Smith, AR", "Fort Smith, AR",
"Fort Smith, AR", "Jonesboro, AR", "Jonesboro, AR", "Jonesboro, AR",
"Jonesboro, AR", "Jonesboro, AR", "Jonesboro, AR", "Little Rock, AR",
"Little Rock, AR", "Little Rock, AR", "Little Rock, AR", "Little Rock, AR",
"Little Rock, AR", "Springdale, AR", "Springdale, AR", "Springdale, AR",
"Springdale, AR", "Springdale, AR", "Springdale, AR", "Texarkana, AR",
"Texarkana, AR", "Texarkana, AR", "Texarkana, AR", "Texarkana, AR",
"Texarkana, AR", "Alameda County, CA", "Alameda County, CA",
"Alameda County, CA", "Alameda County, CA", "Bakersfield, CA",
"Bakersfield, CA", "Bakersfield, CA", "Bakersfield, CA", "Chico, CA",
"Chico, CA", "Chico, CA", "Contra Costa County, CA", "Contra Costa County, CA",
"Contra Costa County, CA", "Contra Costa County, CA", "Fresno, CA",
"Fresno, CA", "Fresno, CA", "Fresno, CA", "Los Angeles, CA"),
    val_lvl1 = c("Cervical (Neck) Pain", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Lumbar (Low Back) Pain",
    "Lumbar (Low Back) Pain", "Neuritis", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Neuritis",
    "Cervical (Neck) Pain", "Lumbar (Low Back) Pain", "Neuritis",
    "Cervical (Neck) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Lumbar (Low Back) Pain",
    "Lumbar (Low Back) Pain", "Neuritis", "Cervical (Neck) Pain"
    ), val_lvl2 = c("Cervical Fusion (Spinal Fusion)", "Non-Surgical Treatment",
    "Lumbar Fusion (Spinal Fusion)", "Lumbar Laminectomy", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Cervical Fusion (Spinal Fusion)",
    "Non-Surgical Treatment", "Lumbar Fusion (Spinal Fusion)",
    "Lumbar Laminectomy", "Non-Surgical Treatment", "Non-Surgical Treatment",
    "Cervical Fusion (Spinal Fusion)", "Non-Surgical Treatment",
    "Lumbar Fusion (Spinal Fusion)", "Lumbar Laminectomy", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Cervical Fusion (Spinal Fusion)",
    "Non-Surgical Treatment", "Lumbar Fusion (Spinal Fusion)",
    "Lumbar Laminectomy", "Non-Surgical Treatment", "Non-Surgical Treatment",
    "Cervical Fusion (Spinal Fusion)", "Non-Surgical Treatment",
    "Lumbar Fusion (Spinal Fusion)", "Lumbar Laminectomy", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Non-Surgical Treatment", "Lumbar Fusion (Spinal Fusion)",
    "Non-Surgical Treatment", "Non-Surgical Treatment", "Non-Surgical Treatment",
    "Lumbar Fusion (Spinal Fusion)", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Non-Surgical Treatment", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Non-Surgical Treatment", "Lumbar Fusion (Spinal Fusion)",
    "Non-Surgical Treatment", "Non-Surgical Treatment", "Non-Surgical Treatment",
    "Lumbar Fusion (Spinal Fusion)", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Non-Surgical Treatment"), val_lvl3 = c("Inpatient Hospital",
    "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Outpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Inpatient Hospital",
    "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Outpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Inpatient Hospital",
    "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Outpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Inpatient Hospital",
    "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Outpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Inpatient Hospital",
    "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Outpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain"
    ), val_lvl4 = c("", "", "", "", "", "", "", "", "", "", "",
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
    "", "", "", "", "", "", "", "", ""), ntwk_avg_low_range_billed_amt = c(80359,
    156, 107300, 51324, 156, 156, 80273, 139, 107333, 51287,
    139, 139, 80351, 151, 107334, 51343, 151, 151, 80270, 148,
    107192, 51146, 148, 148, 80388, 165, 107375, 51381, 165,
    165, 215, 140194, 215, 215, 171, 140051, 171, 171, 158, 158,
    158, 205, 140267, 205, 205, 171, 140318, 171, 171, 205),
    ntwk_avg_low_range_alwd_amt = c(36707, 116, 53412, 19115,
    116, 116, 36700, 126, 53476, 19120, 126, 126, 36681, 121,
    53412, 19060, 121, 121, 36677, 125, 53375, 19018, 125, 125,
    36741, 135, 53475, 19143, 135, 135, 164, 58285, 164, 164,
    111, 58046, 111, 111, 111, 111, 111, 147, 58277, 147, 147,
    117, 58131, 117, 117, 130), ntwk_avg_avg_billed_amt = c(99032,
    554, 139522, 51324, 554, 554, 98926, 495, 139566, 51287,
    495, 495, 99021, 538, 139568, 51343, 538, 538, 98922, 526,
    139383, 51146, 526, 526, 99067, 585, 139621, 51381, 585,
    585, 693, 140194, 693, 693, 551, 140051, 551, 551, 512, 512,
    512, 662, 140267, 662, 662, 553, 140318, 553, 553, 661),
    ntwk_avg_avg_alwd_amt = c(41040, 313, 57902, 19115, 313,
    313, 41033, 340, 57972, 19120, 340, 340, 41011, 326, 57902,
    19060, 326, 326, 41007, 338, 57862, 19018, 338, 338, 41079,
    365, 57970, 19143, 365, 365, 451, 58285, 451, 451, 306, 58046,
    306, 306, 305, 305, 305, 403, 58277, 403, 403, 320, 58131,
    320, 320, 356), ntwk_avg_hi_range_billed_amt = c(104618,
    559, 171745, 51324, 559, 559, 104506, 500, 171800, 51287,
    500, 500, 104607, 543, 171801, 51343, 543, 543, 104502, 532,
    171574, 51146, 532, 532, 104655, 591, 171867, 51381, 591,
    591, 799, 140194, 799, 799, 635, 140051, 635, 635, 590, 590,
    590, 764, 140267, 764, 764, 638, 140318, 638, 638, 762),
    ntwk_avg_hi_range_alwd_amt = c(46388, 318, 62393, 19115,
    318, 318, 46380, 345, 62467, 19120, 345, 345, 46355, 331,
    62393, 19060, 331, 331, 46351, 343, 62349, 19018, 343, 343,
    46432, 371, 62466, 19143, 371, 371, 537, 58285, 537, 537,
    365, 58046, 365, 365, 364, 364, 364, 481, 58277, 481, 481,
    382, 58131, 382, 382, 424), episode_count = c(5L, 284L, 2L,
    1L, 284L, 284L, 5L, 284L, 2L, 1L, 284L, 284L, 5L, 284L, 2L,
    1L, 284L, 284L, 5L, 284L, 2L, 1L, 284L, 284L, 5L, 284L, 2L,
    1L, 284L, 284L, 148L, 1L, 148L, 148L, 148L, 1L, 148L, 148L,
    148L, 148L, 148L, 148L, 1L, 148L, 148L, 148L, 1L, 148L, 148L,
    148L), sample_size = c(12.7788970978329, 326.969758402962,
    3.25471779465034, NA, 326.969758402962, 326.969758402962,
    12.7788970978329, 326.969758402962, 3.25471779465034, NA,
    326.969758402962, 326.969758402962, 12.7788970978329, 326.969758402962,
    3.25471779465034, NA, 326.969758402962, 326.969758402962,
    12.7788970978329, 326.969758402962, 3.25471779465034, NA,
    326.969758402962, 326.969758402962, 12.7788970978329, 326.969758402962,
    3.25471779465034, NA, 326.969758402962, 326.969758402962,
    282.202307833077, NA, 282.202307833077, 282.202307833077,
    282.202307833077, NA, 282.202307833077, 282.202307833077,
    282.202307833077, 282.202307833077, 282.202307833077, 282.202307833077,
    NA, 282.202307833077, 282.202307833077, 282.202307833077,
    NA, 282.202307833077, 282.202307833077, 282.202307833077),
    in_map = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA), in_map.x = c(NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA), in_trmnt = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), in_map.y = c(NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA), in_complete = c(NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
    in_miss = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA), prd_num_of_days_num = c(167,
    46, 117, 209, 46, 46, 167, 46, 117, 209, 46, 46, 167, 46,
    117, 209, 46, 46, 167, 46, 117, 209, 46, 46, 167, 46, 117,
    209, 46, 46, 38, 339, 38, 38, 38, 339, 38, 38, 38, 38, 38,
    38, 339, 38, 38, 38, 339, 38, 38, 38)), .Names = c("mcp_cat_name",
"pln_name", "hosp_refl_rgn_name", "val_lvl1", "val_lvl2", "val_lvl3",
"val_lvl4", "ntwk_avg_low_range_billed_amt", "ntwk_avg_low_range_alwd_amt",
"ntwk_avg_avg_billed_amt", "ntwk_avg_avg_alwd_amt", "ntwk_avg_hi_range_billed_amt",
"ntwk_avg_hi_range_alwd_amt", "episode_count", "sample_size",
"in_map", "in_map.x", "in_trmnt", "in_map.y", "in_complete",
"in_miss", "prd_num_of_days_num"), row.names = c(NA, 50L), class = "data.frame")

1 个答案:

答案 0 :(得分:2)

没有示例数据很难回答,但您可以尝试

split(z_combined_cost_dtrmnt, 
  interaction(
    z_combined_cost_dtrmnt$val_lvl2, 
    z_combined_cost_dtrmnt$val_lvl3
  )
)

interaction创建了一个新因子,它是lvl2和lvl3因子的组合,因此它应该通过唯一因子组合来分割数据。我希望这相当于

split(z_combined_cost_dtrmnt, 
  f = list(
    z_combined_cost_dtrmnt$val_lvl2, 
    z_combined_cost_dtrmnt$val_lvl3
  )
)