Question

我正在解析以下XML中的所有临床试验信息： https://clinicaltrials.gov/ct2/show/NCT00446316?term=imatimib&rank=1?displayxml=true

在非XML版本中（要查看它，只需从上面的URL中删除？displayxml = true），就会出现一个包含数字和带连字符的子级别的列表（特别是资格标准）。在XML版本中，那些连字符仍然存在，但是没有办法将子级别作为实际的子级别。

有问题的列表条目具体如下：

'7. Adequate organ function including the following:', 
'- Adequate bone marrow reserve:', 
'- Total white blood cell count (WBC) > 3.0 x 109/L', 
'- Platelet count >100 x 109/L', '- Hemoglobin >8 g/dL', 
'- Hepatic:', '- Bilirubin: = 1.25 times the upper limit of normal (ULN)', 
'- Alanine transaminase (ALT): = <5 times the ULN', 
'- Aspartate transaminase (AST): = <5 times the ULN', 
'- Renal: Serum creatinine =< 1.5 times the ULN, or creatinine clearance 
=>60mL/minute as calculated by the standard Cockcroft Gault formula.'

这些应该是同一条目的一部分（第7号），但不是。有没有办法以编程方式

a）检测到那些子标题存在且

b）让它们成为同一条目的一部分（在某种程度上，不必完美）

非常感谢。

Python从xml的flattened列表中获取子值

0 个答案: