我问过同样的问题,但是这个话题仍然有问题。
假设我有像这样的数据集A:
**Name**
Liver cell carcinoma
Stomach, unspecified
Malignant neoplasm of rectum
Lumbar and other intervertebral disc disorders with radiculopathy
Bronchus or lung, unspecified
Cerebral infarction, unspecified
Pneumonia, unspecified
Headache
Spinal stenosis, lumbar region
Other specified intervertebral disc displacement
Sigmoid colon
Calculus of ureter
Colon, unspecified
Concussion, without open intracranial wound
Malignant neoplasm of thyroid gland
Breast, unspecified
Other and unspecified cirrhosis of liver
Chronic viral hepatitis B without delta- agent
Dizziness and giddiness
Tension-type headache
Malignant neoplasm of stomach, unspecified, unspecified
Cervical disc disorder with radiculopathy
Malignant neoplasm of bronchus or lung, unspecified, unspecified side
Chest pain, unspecified
Gastroenteritis and colitis of unspecified origin
Bronchiectasis
Concussion
Body of stomach
Acute tubulo-interstitial nephritis
Traumatic subdural haemorrhage, without open intracranial wound
Abnormal findings on diagnostic imaging of lung
Angina pectoris, unspecified
Other disorders of lung
Ascending colon
Essential(primary) hypertension
Pyloric antrum
Intrahepatic bile duct carcinoma
Cervix uteri, unspecified
Gastro-oesophageal reflux disease with oesophagitis
Liver
Fracture of nasal bone, closed
Malignant neoplasm of rectosigmoid junction
Open wound of scalp
Other cerebral infarction
Cerebral aneurysm, nonruptured
Malignant neoplasm of kidney, except renal pelvis
Malignant neoplasm of prostate
Unspecified abdominal pain
而且,数据集B类似于:
Part Key
Abdominal abdomen
Abdominal abdominal
Other acute myeloblastic leukaemia
Abdominal adrenal
Head allergic rhinitis
Head Alzheimer's
Abdominal ampulla
Abdominal aneurysm
Chest angina
Abdominal antrum
Chest aorta
Abdominal appendicitis
Head arteries
Abdominal ascites
Chest asthma
Abdominal back
other b-cell lymphoma
Abdominal bile duct
Abdominal biliary tract
Abdominal bladder
Head brain
Chest breast
Chest Bronchiectasis
Chest bronchitis
Chest bronchopneumonia
Chest bronchus
Abdominal C64
Abdominal caecum
Abdominal cardia
Head cavity
Head cerebral
Chest cerebrovascular
Head cerebrovascular
Abdominal cervical
Abdominal cervix
Other chemotherapy session for neoplasm
Chest chest
Abdominal cholangitis
Abdominal cholecystitis
Chest circulatorycomplications
Abdominal colon
Head concussion
other connective and soft tissue, unspecified
Head convulsions
Chest Cough
Lung cough
我运行了以下代码:
result <-A %>%
mutate(key = gsub(paste0(".*(", paste(B$key, collapse = "|"), ").*"),"\\1",tolower(A$NAME))) %>%
left_join(B)
结果中有一些重复的行。
创建我想要的数据集的最佳代码是什么? 我希望我的结果表如下:
Name Key Part
Liver cell carcinoma liver Abdominal
Stomach, unspecified stomach Abdominal
答案 0 :(得分:0)
使用发布在here上并保留在dplyr
世界中的数据,您可以应用distinct
函数:
tmp %>%
mutate(key = gsub(paste0(".*(", paste(tmp2$key, collapse = "|"), ").*"), "\\1",tolower(tmp$Disease_name))) %>%
left_join(tmp2) %>% distinct()
Joining, by = "key"
Disease_name key parts
1 (J189)Pneumonia, unspecified pneumonia Chest
2 (R51)Headache headache Head
3 (M4806)Spinal stenosis, lumbar region spinal Abdominal
4 (M512)Other specified intervertebral disc displacement intervertebral Abdominal
5 (C187)Sigmoid colon colon Abdominal
6 (N201)Calculus of ureter ureter Abdominal
7 (C189)Colon, unspecified colon Abdominal
8 (S0600)Concussion, without open intracranial wound concussion Head
9 (C73)Malignant neoplasm of thyroid gland thyroid Neck
10 (C509)Breast, unspecified breast Chest
11 (K746)Other and unspecified cirrhosis of liver liver Abdominal
12 (B181)Chronic viral hepatitis B without delta- agent hepatitis Abdominal
13 (R42)Dizziness and giddiness giddiness Head