从html文件中提取字符串

时间:2015-02-05 14:40:31

标签: html ios html-parsing

鉴于html文件的这一部分,我正在寻找一种方法从“Metronidazole ......”开始提取文本到“指示和使用”下的结尾。

有什么建议吗?

<div class="Section" data-sectionCode="34067-9">
<a name="section-4"></a>
<p></p>
<h1>
<span class="Bold">INDICATIONS &amp; USAGE
</span>
</h1>
<p class="First">Metronidazole vaginal gel USP, 0.75% is indicated in the treatment of bacterial vaginosis (formerly referred to as <span class="Italics">Haemophilus</span> vaginitis, <span class="Italics">Gardnerella</span> vaginitis, nonspecific vaginitis, <span class="Italics">Corynebacterium</span> vaginitis, or anaerobic vaginosis).</p>
<dl>
<dt></dt>
<dd>
<p class="First">
<span class="Bold">NOTE:</span> For purposes of this indication, a clinical diagnosis of bacterial vaginosis is usually defined by the presence of a homogeneous vaginal discharge that (a) has a pH of greater than 4.5, (b) emits a &ldquo;fishy&rdquo; amine odor when mixed with a 10% KOH solution, and (c) contains clue cells on microscopic examination. Gram&rsquo;s stain results consistent with a diagnosis of bacterial vaginosis include (a) markedly reduced or absent <span class="Italics">Lactobacillus</span> morphology, (b) predominance of <span class="Italics">Gardnerella</span> morphotype, and (c) absent or few white blood cells.</p>
</dd>
</dl>
<p>Other pathogens commonly associated with vulvovaginitis, e.g., <span class="Italics">Trichomonas vaginalis</span>, <span class="Italics">Chlamydia trachomatis</span>, <span class="Italics">N</span>. <span class="Italics">gonorrhoeae</span>, <span class="Italics">Candida albicans</span>, and <span class="Italics">Herpes simplex</span> virus should be ruled out.</p>
</div>

适应症和用法

甲硝唑阴道凝胶USP,0.75%用于治疗细菌性阴道病(以前称为嗜血杆菌性阴道炎,加德纳菌阴道炎,非特异性阴道炎,棒状杆菌性阴道炎或无氧性阴道病)。

注意:就本适应症而言,细菌性阴道病的临床诊断通常由均质阴道分泌物的存在来定义:(a)pH值大于4.5,(b)混合时会发出“鱼腥”的胺气味。 10%KOH溶液,和(c)在显微镜检查中含有线索细胞。革兰氏染色结果与细菌性阴道病的诊断一致,包括(a)乳杆菌形态明显减少或缺失,(b)加德纳菌形态优势,(c)白细胞缺失或少数。

应排除通常与外阴阴道炎相关的其他病原体,例如阴道毛滴虫,沙眼衣原体,淋病奈瑟菌,白色念珠菌和单纯疱疹病毒。

1 个答案:

答案 0 :(得分:0)

你可以使用一些旧的学校技巧,

  1. 首先将NSString转换为字符数组(源数组)。
  2. 创建一个空目标字符数组。
  3. 开始使用逻辑将字符从源添加到目标数组。
  4. 如果您发现&#39;&lt;&#39; (html标记的开头)字符停止在目标数组中添加字符,直到找到&#39;&gt;&#39; (html标签的结尾)字符。
  5. 或者您可以使用其他特技

    NSString* startTag = @"<";
    NSString* endTag = @">";
    NSString* replacementString = @"";
    
     while ([str rangeOfString:startTag].length != 0 && [str rangeOfString:endTag].length != 0)
    {
    
      NSRange range1 = [str rangeOfString:startTag];
      NSRange range2 = [str rangeOfString:endTag];
    
      if(range1.location>range2.location)
       break;
    
    NSRange newRange;
    newRange.length =range2.location-range1.location+range2.length;
    newRange.location = range1.location;
    str = [str stringByReplacingCharactersInRange:newRange withString:replacementString];
    
      }
    

    然后您可以按照&#34;指示&amp;中所述找到文本。 USAGE&#34;通过rangeOfString方法,您希望的文本应该在此范围之后。