我在asdfree上找到了以下关于分类法的脚本。当前脚本将所有专业合并为一列asdfree original script。问题是当前脚本忽略了专业的层次结构。
以下代码可让您了解实际上有多个级别
library(downloader)
tf <- tempfile()
download("https://raw.githubusercontent.com/ajdamico/asdfree/master/National%20Plan%20and%20Provider%20Enumeration%20System/taxonomy%20id%20table.txt", tf)
z <- readLines(tf)
hmt <- gregexpr("\t", z)
l <- unlist(lapply(hmt, function(x) length(x[x > 0])))
specialty_groups <- pre[l == 1]
specialty_individual <- pre[l == 2]
问题在于,Allegery和Immunology(排在第一行)是错误的,它应该真的进入最后一栏。
6 2 Allergy & Immunology 207K00000X Allopathic & Osteopathic Physicians <NA>
7 3 Allergy 207KA0200X Allopathic & Osteopathic Physicians Allergy & Immunology
8 3 Clinical & Laboratory Immunology 207KI0005X Allopathic & Osteopathic Physicians Allergy & Immunology
9 2 Anesthesiology 207L00000X Allopathic & Osteopathic Physicians <NA>
换句话说,数据应该看起来像这样
LEVEL_1 LEVEL_2 LEVEL_3 TAXONOMY
Allopathic & Osteopathic Physicians Allergy & Immunology 207K00000X
Allopathic & Osteopathic Physicians Allergy & Immunology Allergy 207KA0200X
Allopathic & Osteopathic Physicians Allergy & Immunology Clinical & Laboratory Immunology 207KI0005X
如何在R中使用正则表达式来实现这一目标?