我的病理结果集很长。每个患者都有一个唯一的标识符(在本例中为“ row_id”。对于每个患者,他们在特定日期(“ sample_date”)进行了采样。他们所进行的测试范围非常不同,并且输出结果异质(某些包括字符串和一些数字)。而且,并非每个患者在每个sample_date都进行过所有检查,因此应该有很多NA。
执行的测试的名称在“ test_name”列中,结果在“结果”列中。 我想将其放入一个广泛的数据集中,使用“ test_name”作为列标题散布“ result”列,但将标识符保留为“ row_id”和“ sample_date”。
tidyr中新的ivot_wider函数似乎非常适合我的需求,当我运行它时,它为我提供了所需的数据帧类型(即,行仍由row_id和sample_date标识,但现在有用于每个test_name及其结果。
这是我的数据集的一小部分样本:
structure(list(row_id = 1:81, sample_date = structure(c(16444,
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444,
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444,
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444,
16444, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447,
16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447,
16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16448,
16448, 16448, 16448, 16448, 16448, 16442, 16442, 16442, 16442,
16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442,
16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442), class = "Date"),
test_name = c("Epidemic Typhus Group IgG Abs", "Epidemic Typhus Group IgM Abs",
"Spotted Fever Group IgG Abs", "Spotted Fever Group IgM Abs",
"Albumin", "Alkaline phosphatase", "Alanine transaminase",
"Basophils", "Bilirubin (total)", "Creatinine", "C-reactive protein",
"Eosinophils", "Estimated GFR", "Haemoglobin (g/L)", "HCT",
"Potassium", "Lymphocytes", "MCHC (g/L)", "MCH", "MCV", "Monocytes",
"MPV", "Sodium", "Neutrophils", "Platelet count", "Red cell count",
"RDW", "Urea", "White cell count", "Albumin", "Alkaline phosphatase",
"Alanine transaminase", "Basophils", "Bilirubin (total)",
"Creatinine", "C-reactive protein", "Eosinophils", "Estimated GFR",
"Haemoglobin (g/L)", "HCT", "Potassium", "Lymphocytes", "MCHC (g/L)",
"MCH", "MCV", "Monocytes", "MPV", "Sodium", "Neutrophils",
"Platelet count", "Red cell count", "RDW", "Urea", "White cell count",
"Creatinine", "C-reactive protein", "Estimated GFR", "Potassium",
"Sodium", "Urea", "Albumin", "Alkaline phosphatase", "Alanine transaminase",
"APTT Ratio", "APTT", "Basophils", "Bilirubin (total)", "Creatinine",
"C-reactive protein", "Eosinophils", "Fibrinogen", "Estimated GFR",
"Haemoglobin (g/L)", "HCT", "INR", "Potassium", "Lymphocytes",
"MCHC (g/L)", "MCH", "MCV", "Monocytes"), result = c("Not detected",
"Not detected", "Not detected", "Not detected", "47", "84",
"29", "0.3% 0.03", "12", "98", "3.3", "1.7% 0.15", "77\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.",
"156", "0.435", "3.8", "25.7% 2.31", "359", "30.4", "84.6",
"7.1% 0.64", "10.1", "140", "65.2% 5.86", "240", "5.14",
"12.4", "3.9", "8.99", "45", "53", "41", "0.3% 0.03", "10",
"59", "2.0", "2.8% 0.32", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.",
"126", "0.398", "4.5", "28.7% 3.30", "317", "25.7", "81.2",
"5.7% 0.65", "10.8", "143", "62.5% 7.18", "411", "4.90",
"14.7", "3.5", "11.49", "59", "76.2", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.",
"4.2", "139", "3.4", "46", "47", "40", "1.3", "39", "0.4% 0.01",
"8", "74", "7.0", "0.4% 0.01", "2.50", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.",
"146", "0.441", "0.96", "4.3", "43.2% 1.14", "331", "29.1",
"87.8", "6.8% 0.18")), class = "data.frame", row.names = c(NA,
-81L))
这是我使用过的pivot_wider代码(在“ path_results”上方称为数据集:
path_results_wide <- path_results %>%
select(row_id, sample_date, test_name, result)%>%
pivot_wider(
id_cols = c(row_id,
sample_date),
names_from = test_name,
values_from = result
)
有些列应该是数字列,有些应该是字符串,但是ivot_wider已将它们全部解析为字符列表,当我尝试将其更改为数字时,出现以下错误:
path_results_wide$Albumin <- as.numeric(path_results_wide$Albumin)
错误:无法将
任何有关我可以解决此问题的建议都将受到欢迎。 谢谢。
答案 0 :(得分:0)
旧答案:
不确定是否可以使用pivot_wider,但是如果我没有正确理解,我认为使用reshape2可能正是您想要的。由于每一行都是患者和日期,因此存在多个NA值,在该日期进行了特定的测试。
library(reshape2)
res <- dcast(path_results, row_id + sample_date ~ test_name)
新答案:
在阅读另一个issue中的dcast函数时,我意识到我们需要添加另一个id列来唯一地标识每个单独的行。然后,我在阅读有关dplyr中的散布函数时遇到了以下问题:
path_results_wide <- path_results %>%
rowid_to_column() %>%
spread(test_name, result)