Question

我的病理结果集很长。每个患者都有一个唯一的标识符（在本例中为“ row_id”。对于每个患者，他们在特定日期（“ sample_date”）进行了采样。他们所进行的测试范围非常不同，并且输出结果异质（某些包括字符串和一些数字）。而且，并非每个患者在每个sample_date都进行过所有检查，因此应该有很多NA。

执行的测试的名称在“ test_name”列中，结果在“结果”列中。我想将其放入一个广泛的数据集中，使用“ test_name”作为列标题散布“ result”列，但将标识符保留为“ row_id”和“ sample_date”。

tidyr中新的ivot_wider函数似乎非常适合我的需求，当我运行它时，它为我提供了所需的数据帧类型（即，行仍由row_id和sample_date标识，但现在有用于每个test_name及其结果。

这是我的数据集的一小部分样本：

structure(list(row_id = 1:81, sample_date = structure(c(16444, 
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 
16444, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 
16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 
16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16448, 
16448, 16448, 16448, 16448, 16448, 16442, 16442, 16442, 16442, 
16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442, 
16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442), class = "Date"), 
    test_name = c("Epidemic Typhus Group IgG Abs", "Epidemic Typhus Group IgM Abs", 
    "Spotted Fever Group IgG Abs", "Spotted Fever Group IgM Abs", 
    "Albumin", "Alkaline phosphatase", "Alanine transaminase", 
    "Basophils", "Bilirubin (total)", "Creatinine", "C-reactive protein", 
    "Eosinophils", "Estimated GFR", "Haemoglobin (g/L)", "HCT", 
    "Potassium", "Lymphocytes", "MCHC (g/L)", "MCH", "MCV", "Monocytes", 
    "MPV", "Sodium", "Neutrophils", "Platelet count", "Red cell count", 
    "RDW", "Urea", "White cell count", "Albumin", "Alkaline phosphatase", 
    "Alanine transaminase", "Basophils", "Bilirubin (total)", 
    "Creatinine", "C-reactive protein", "Eosinophils", "Estimated GFR", 
    "Haemoglobin (g/L)", "HCT", "Potassium", "Lymphocytes", "MCHC (g/L)", 
    "MCH", "MCV", "Monocytes", "MPV", "Sodium", "Neutrophils", 
    "Platelet count", "Red cell count", "RDW", "Urea", "White cell count", 
    "Creatinine", "C-reactive protein", "Estimated GFR", "Potassium", 
    "Sodium", "Urea", "Albumin", "Alkaline phosphatase", "Alanine transaminase", 
    "APTT Ratio", "APTT", "Basophils", "Bilirubin (total)", "Creatinine", 
    "C-reactive protein", "Eosinophils", "Fibrinogen", "Estimated GFR", 
    "Haemoglobin (g/L)", "HCT", "INR", "Potassium", "Lymphocytes", 
    "MCHC (g/L)", "MCH", "MCV", "Monocytes"), result = c("Not detected", 
    "Not detected", "Not detected", "Not detected", "47", "84", 
    "29", "0.3%  0.03", "12", "98", "3.3", "1.7%  0.15", "77\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.", 
    "156", "0.435", "3.8", "25.7%  2.31", "359", "30.4", "84.6", 
    "7.1%  0.64", "10.1", "140", "65.2%  5.86", "240", "5.14", 
    "12.4", "3.9", "8.99", "45", "53", "41", "0.3%  0.03", "10", 
    "59", "2.0", "2.8%  0.32", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.", 
    "126", "0.398", "4.5", "28.7%  3.30", "317", "25.7", "81.2", 
    "5.7%  0.65", "10.8", "143", "62.5%  7.18", "411", "4.90", 
    "14.7", "3.5", "11.49", "59", "76.2", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.", 
    "4.2", "139", "3.4", "46", "47", "40", "1.3", "39", "0.4%  0.01", 
    "8", "74", "7.0", "0.4%  0.01", "2.50", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.", 
    "146", "0.441", "0.96", "4.3", "43.2%  1.14", "331", "29.1", 
    "87.8", "6.8%  0.18")), class = "data.frame", row.names = c(NA, 
-81L))

这是我使用过的pivot_wider代码（在“ path_results”上方称为数据集：

path_results_wide <- path_results %>%
  select(row_id, sample_date, test_name, result)%>%
  pivot_wider(
    id_cols = c(row_id,
                sample_date), 
    names_from = test_name, 
    values_from = result
  )

有些列应该是数字列，有些应该是字符串，但是ivot_wider已将它们全部解析为字符列表，当我尝试将其更改为数字时，出现以下错误：

path_results_wide$Albumin <- as.numeric(path_results_wide$Albumin)

错误：无法将>转换为

任何有关我可以解决此问题的建议都将受到欢迎。谢谢。

Answer 1

旧答案：

不确定是否可以使用pivot_wider，但是如果我没有正确理解，我认为使用reshape2可能正是您想要的。由于每一行都是患者和日期，因此存在多个NA值，在该日期进行了特定的测试。

library(reshape2)
res <- dcast(path_results, row_id + sample_date ~ test_name)

新答案：

在阅读另一个issue中的dcast函数时，我意识到我们需要添加另一个id列来唯一地标识每个单独的行。然后，我在阅读有关dplyr中的散布函数时遇到了以下问题：

path_results_wide <- path_results %>%
    rowid_to_column() %>%
    spread(test_name, result)

错误：在tidyr中使用新的pivot_wider函数后，无法将<list_of <character >>转换为<character>

1 个答案: