如何根据浏览器视图对HTML文件的内容进行排序?

时间:2019-05-20 16:37:35

标签: html css

我们需要使用Python-Pipeline处理用QuarkXPress 2018编写的超过1500页的书,以使用Triple Store中的数据。不幸的是,QuarkXpress没有提供有用的导出格式来达到我们的目的。最好的是HTML导出。

这对于文档中的“普通”文本效果很好。但是其他所有内容,例如表格,图片或特殊的文本块,始终位于HTML文档的末尾。根据原始书籍的观点,浏览器中的视图是正确的,因为所有元素都彼此绝对定位。

enter image description here

上图显示在浏览器视图的左侧,而HTML文档则显示在右侧。颜色包含相同的线条。如您所见,绿色,橙色,黄色,蓝色和紫色块位于红色块(红色块=普通文本)中,从而根据该书创建原始视图。

为解决该问题,我们尝试将HTML文档导入Python-Pipeline中,然后导入与PDF-Document相同的页面,并根据HTML-Tag在PDF-Document中的位置使用简单的字符串-匹配。它的效果还不错,但当然不适用于任何图片,而且出错率也很高。由于标记的元信息(标题,字幕,表格等)对我们也很重要,并且可以在HTML文档中使用,因此我们不仅只能使用PDF文档。

当浏览器以正确的顺序显示数据时,必须有一种计算顺序的方法。是否存在一种工具,可以按照从浏览器中显示的顺序对HTML标签进行排序?

您还找到了显示问题的更简单的HTML和CSS文件。

.para-NoStyle-127 {
  font-family: 'Arial', 'ArialMT', 'Helvetica', 'sans-serif';
  font-size: 140px;
  font-weight: normal;
  text-decoration: none;
  -webkit-font-kerning: Normal;
  font-kerning: Normal;
  -webkit-font-variant-ligatures: no-common-ligatures;
  font-variant-ligatures: no-common-ligatures;
  -webkit-font-feature-settings: "rlig" 0, "liga" 0, "clig" 0, "calt" 0, "locl" 0, "ccmp" 0, "mark" 0, "mkmk" 0;
  font-feature-settings: "rlig" 0, "liga" 0, "clig" 0, "calt" 0, "locl" 0, "ccmp" 0, "mark" 0, "mkmk" 0;
  color: #222021;
  position: absolute;
  left: 0px;
  top: 0px;
}

.QxpTextBox {
  position: absolute;
  left: 0px;
  top: 0px;
  white-space: nowrap;
  width: 100%;
  height: 100%;
  line-height: 1;
  transform-origin: 0% 0%;
  -webkit-transform-origin: 0% 0%;
  transform: scale(0.05, 0.05);
  -webkit-transform: scale(0.05, 0.05);
}

.QxpVertTextBox {
  position: absolute;
  white-space: nowrap;
  width: 100%;
  height: 100%;
  top: 0px;
  right: 0px;
  line-height: 1;
  transform-origin: 100% 0%;
  -webkit-transform-origin: 100% 0%;
  transform: scale(0.05, 0.05);
  -webkit-transform: scale(0.05, 0.05);
}
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
  <title>
    237-276_DIV_CH
  </title>
  <meta content="width=369,user-scalable=no" name="viewport" />
  <meta content="QuarkXPress 14.0.1" name="generator" />
  <meta content="text/html;CHARSET=utf-8" http-equiv="Content-Type" />
  <link href="ProjectCSS.css" rel="stylesheet" type="text/css" />
  <link href="../assets/common.css" rel="stylesheet" type="text/css" />
  <style type="text/css">
    .panandzoom {
      -webkit-transform-origin: 0px 0px;
      -webkit-transition-property: -webkit-transform;
      -webkit-transition-timing-function: ease;
    }
    
    a:link {
      color: blue;
      text-decoration: underline;
    }
    
    a:active {
      color: red
    }
    
    a:visited {
      color: purple
    }
    
    sub,
    sup {
      line-height: 0;
    }
    /* body style */
    
    body {
      background-color: white;
      color: #222021;
    }
    
    #box1 {
      position: absolute;
      left: 339.661px;
      top: 575.433px;
      -webkit-transform: scale(1);
      width: 28px;
      height: 15px;
    }
    
    #box1_Props {
      padding: 2px;
      width: 24.346px;
      height: 10.74px;
    }
    
    #box2 {
      position: absolute;
      left: 27.85px;
      top: 43.243px;
      -webkit-transform: scale(1);
      width: 312px;
      height: 532px;
    }
    
    #anchbox3 {
      position: absolute;
      -webkit-transform: scale(1);
      width: 285px;
      height: 45px;
      display: inline-block;
    }
    
    #anchbox3_Props {
      background-color: #E1ECF5;
      width: 280.819px;
      height: 41.304px;
      padding: 2px;
    }
    
    #anchbox4 {
      position: absolute;
      width: 252px;
      height: 44px;
      display: inline-block;
    }
    
    #anchbox5 {
      position: absolute;
      width: 289px;
      height: 91px;
      display: block;
      page-break-inside: initial !important;
    }
    
    #box6 {
      position: relative;
      -webkit-transform: scale(1);
      width: 289px;
      height: 11px;
    }
    
    #box7 {
      position: relative;
      -webkit-transform: scale(1);
      width: 41px;
      height: 19px;
    }
    
    #box8 {
      position: relative;
      -webkit-transform: scale(1);
      width: 248px;
      height: 19px;
    }
    
    #box9 {
      position: relative;
      -webkit-transform: scale(1);
      width: 41px;
      height: 20px;
    }
    
    #box10 {
      position: relative;
      -webkit-transform: scale(1);
      width: 248px;
      height: 20px;
    }
    
    #box11 {
      position: relative;
      -webkit-transform: scale(1);
      width: 41px;
      height: 27px;
    }
    
    #box12 {
      position: relative;
      -webkit-transform: scale(1);
      width: 248px;
      height: 27px;
    }
    
    #box13 {
      position: relative;
      -webkit-transform: scale(1);
      width: 41px;
      height: 13px;
    }
    
    #box14 {
      position: relative;
      -webkit-transform: scale(1);
      width: 248px;
      height: 13px;
    }
    
    #anchbox24 {
      position: absolute;
      width: 289px;
      height: 123px;
      display: block;
      page-break-inside: initial !important;
    }
    
    #box25 {
      position: relative;
      -webkit-transform: scale(1);
      width: 289px;
      height: 11px;
    }
    
    #box26 {
      position: relative;
      -webkit-transform: scale(1);
      width: 40px;
      height: 36px;
    }
    
    #box27 {
      position: relative;
      -webkit-transform: scale(1);
      width: 248px;
      height: 36px;
    }
    
    #box28 {
      position: relative;
      -webkit-transform: scale(1);
      width: 40px;
      height: 27px;
    }
    
    #box29 {
      position: relative;
      -webkit-transform: scale(1);
      width: 248px;
      height: 27px;
    }
    
    #box30 {
      position: relative;
      -webkit-transform: scale(1);
      width: 40px;
      height: 19px;
    }
    
    #box31 {
      position: relative;
      -webkit-transform: scale(1);
      width: 248px;
      height: 19px;
    }
    
    #box32 {
      position: relative;
      -webkit-transform: scale(1);
      width: 40px;
      height: 27px;
    }
    
    #box33 {
      position: relative;
      -webkit-transform: scale(1);
      width: 248px;
      height: 27px;
    }
    
    #box43 {
      position: absolute;
      left: 342.496px;
      top: 104.882px;
      width: 91px;
      height: 44px;
    }
    
    #box44 {
      position: absolute;
      left: 27.85px;
      top: 19.843px;
      -webkit-transform: scale(1);
      width: 312px;
      height: 20px;
    }
    
    #box44_Props {
      border-width: 0.2px;
      border-style: solid;
      border-color: #222021;
      width: 308.411px;
      height: 16.341px;
      padding: 1.5px;
    }
    
    a.nostyle {
      text-decoration: none;
      color: inherit;
    }
    
    .page {
      position: relative;
      overflow: hidden;
      width: 369px;
      height: 595px;
    }
  </style>
</head>

<body style="margin:0%;">
  <div class="page" id="section1">
    <div id="box1">
      <div id="box1_Props">
        <!-- bg -->
      </div>
      <div class="QxpTextBox" style="left:2px;top:2px;">
        <span class="char-Normal-Local-22" style="top:-18.73px;">173</span>
      </div>
    </div>
    <div id="box2">
      <div class="QxpTextBox">
        <div style="position:absolute;left:42.52px;top:0.00px;width:6194px;height:220px;background-color:#005BAA;">
          <!-- 
        Rule -->
        </div>
        <span class="char-Titulo_0_principal-Local-80" style="left:42.52px;top:10.90px;">Herzinsuffizienz (HII)</span>
        <span class="char-Titulo_0_principal-Local-80" style="left:5716.00px;top:10.90px;">[I50.9]</span>
        <div style="position:absolute;left:0.00px;top:191.19px;width:6236px;height:20px;background-color:#005BAA;z-index:-1;">
          <!-- 
        Rule -->
        </div>
        <span class="para-NoStyle-127" style="left:623.62px;top:951.69px;">-</span>
        <span class="para-NoStyle-127" style="left:793.70px;top:951.69px;">Inzidenz in der Schweiz: 20’000 neue Fälle/Jahr</span>
        <span class="para-NoStyle-127" style="left:623.62px;top:1112.68px;">-</span>
        <span class="para-NoStyle-127" style="left:793.70px;top:1112.68px;">Mortalität der akuten HI: &gt; 50 %/12 Mt. (ohne Therapie der Grundkrankheit)</span>
        <span class="para-NoStyle-127" style="left:453.54px;top:1273.68px;">•</span>
        <span class="para-NoStyle-127" style="left:623.62px;top:1273.68px;">Der wichtigste Faktor, welcher zur Verschlimmerung der HI beiträgt ist die neurohumorale </span>
        <span class="para-NoStyle-127" style="left:623.62px;top:1434.68px;">Aktivierung (d.h. Aktivierung des Sympathikus und des RAAS)!</span>

        <div style="position:absolute;left:0.00px;top:2710.48px;width:6236px;height:200px;background-color:#646364;">
          <!-- 
        Rule -->
        </div>
        <span class="para-NoStyle-127" style="top:2750.93px;">Klas:</span>
        <span class="para-NoStyle-176" style="left:510.24px;top:2750.93px;">«Zeitliche» Klassifizierung</span>

        <span class="para-NoStyle-127" style="left:453.54px;top:2968.62px;">1.</span>
        <span class="para-NoStyle-127" style="left:623.62px;top:2968.62px;">Akute Herzinsuffizienz</span>
        <span class="para-NoStyle-127" style="left:453.54px;top:3129.62px;">2.</span>
        <span class="para-NoStyle-127" style="left:623.62px;top:3129.62px;">Akute Dekompensation einer Herzinsuffizienz</span>
        <span class="para-NoStyle-127" style="left:453.54px;top:3290.62px;">3.</span>
        <span class="para-NoStyle-127" style="left:623.62px;top:3290.62px;">Chronische Herzinsuffizienz</span>

        <span class="para-NoStyle-115" style="left:453.54px;top:7378.32px;">Tabelle: Klassifikation der Herzinsuffizienz nach der NYHA.</span>

        <div style="position:absolute;left:283.46px;top:7635.24px;width:5953px;height:180px;background-color:#646364;">
          <!-- 
        Rule -->
        </div>
        <span class="para-NoStyle-176" style="left:510.24px;top:7655.68px;">ACC/AHA Klassifikation</span>
        <span class="para-NoStyle-327" style="left:2099.73px;top:7661.29px;font-size:91px;"> </span>
        <span class="para-NoStyle-195" style="left:2125.01px;top:7672.61px;">[</span>
        <span class="para-NoStyle-195" style="left:2158.35px;top:7672.61px;">ACC/AHA Guidelines. JACC 2001;38:2101</span>
        <span class="para-NoStyle-195" style="left:4432.96px;top:7672.61px;">]</span>


        <span class="para-NoStyle-115" style="left:453.54px;top:10337.87px;">Tabelle: ACC/AHA Klassifikation der HI.</span>
      </div>
      <div style="position:absolute;left:23.68px;top:81.61px;">
        <div id="anchbox3">
          <div id="anchbox3_Props">
            <!-- bg -->
          </div>
          <div class="QxpTextBox" style="left:2px;top:2px;">
            <span class="para-NoStyle-9" style="top:-5.43px;">Für die PRAXIS</span>
            <div style="position:absolute;left:0.00px;top:137.76px;width:1027px;height:20px;background-color:#005BAA;z-index:-1;">
              <!-- 
                Rule -->
            </div>
            <span class="para-NoStyle-18" style="top:183.91px;">Es ist essentiell, die URSACHE des Syndroms «Herzinsuffizienz» zu suchen!</span>
            <span class="para-NoStyle-18" style="top:344.91px;">Die häufigsten Ursachen der HI in den westlichen Ländern sind:</span>
            <span class="para-NoStyle-18" style="top:505.91px;">•</span>
            <span class="para-NoStyle-18" style="left:170.08px;top:505.91px;">Koronare Herzkrankheit (KHK) →  i.d.R. systolische Dysfunktion </span>
            <span class="para-NoStyle-18" style="top:666.91px;">•</span>
            <span class="para-NoStyle-18" style="left:170.08px;top:666.91px;">Arterielle Hypertonie (AHT) →  i.d.R. diastolische Dysfunktion</span>
          </div>
        </div>
      </div>
      <div style="position:absolute;left:31.82px;top:174.41px;">
        <div id="anchbox4">
          <img alt="77.png" class="qxpASDImage" height="45" src="assets/77.png" width="253" />
        </div>
      </div>
      <div style="position:absolute;left:22.68px;top:391.66px;">
        <div id="anchbox24">
          <table class="table1_1">
            <colgroup>
              <col style="width:14.07%;" />
              <col style="width:85.93%;" />
            </colgroup>
            <tbody>
              <tr>
                <td class="td1_1" colspan="2">
                  <div id="box25">
                    <div class="QxpTextBox" style="left:2px;top:2px;">
                      <span class="para-NoStyle-36" style="top:-7.62px;">ACC/AHA Klassifikation der Herzinsuffizienz</span>
                    </div>
                  </div>
                </td>

              </tr>
              <tr>
                <td class="td1_11">
                  <div id="box26">
                    <div class="QxpTextBox" style="left:2px;top:2px;">
                      <span class="para-NoStyle-36" style="top:-16.61px;">Grad A</span>
                    </div>
                  </div>
                </td>
                <td class="td1_12">
                  <div id="box27">
                    <div class="QxpTextBox" style="left:2px;top:2px;">
                      <span class="para-NoStyle-127" style="top:-16.61px;">•</span>
                      <span class="para-NoStyle-127" style="left:49.01px;top:-16.61px;">   </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:-16.61px;">Patienten mit hohem Risiko, eine HI zu entwickeln (z.B. art. Hypertonie, </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:144.39px;">KHK, Diabetes mellitus, Alkoholabusus, Kokainabusus u.a.).</span>
                      <span class="para-NoStyle-127" style="top:305.39px;">•</span>
                      <span class="para-NoStyle-127" style="left:49.01px;top:305.39px;">   </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:305.39px;">Keine strukturellen oder funktionellen Myokard-, Perikard- oder </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:466.39px;">Klappenabnormitäten. Keine Symptome.</span>
                    </div>
                  </div>
                </td>

              </tr>
              <tr>
                <td class="td1_6">
                  <div id="box28">
                    <div class="QxpTextBox" style="left:2px;top:2px;">
                      <span class="para-NoStyle-36" style="top:-16.61px;">Grad B</span>
                    </div>
                  </div>
                </td>
                <td class="td1_7">
                  <div id="box29">
                    <div class="QxpTextBox" style="left:2px;top:2px;">
                      <span class="para-NoStyle-127" style="top:-16.61px;">•</span>
                      <span class="para-NoStyle-127" style="left:49.01px;top:-16.61px;">   </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:-16.61px;">Patienten mit struktureller Herzkrankheit, welche aber keine Symptome </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:144.39px;">oder Befunde einer HI aufweisen (z.B. linksventrikuläre Hypertrophie oder </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:305.39px;">Dilatation, Status nach Myokardinfarkt u.a.).</span>
                    </div>
                  </div>
                </td>

              </tr>
              <tr>
                <td class="td1_2">
                  <div id="box30">
                    <div class="QxpTextBox" style="left:2px;top:2px;">
                      <span class="para-NoStyle-36" style="top:-16.61px;">Grad C</span>
                    </div>
                  </div>
                </td>
                <td class="td1_3">
                  <div id="box31">
                    <div class="QxpTextBox" style="left:2px;top:2px;">
                      <span class="para-NoStyle-127" style="top:-16.61px;">•</span>
                      <span class="para-NoStyle-127" style="left:49.01px;top:-16.61px;">   </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:-16.61px;">Patienten mit aktuellen oder vorgängigen HI-Symptomen, welche einer </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:144.39px;">strukturellen Herzkrankheit zuzuordnen sind.</span>
                    </div>
                  </div>
                </td>

              </tr>
              <tr>
                <td class="td1_6">
                  <div id="box32">
                    <div class="QxpTextBox" style="left:2px;top:2px;">
                      <span class="para-NoStyle-36" style="top:-16.61px;">Grad D</span>
                    </div>
                  </div>
                </td>
                <td class="td1_7">
                  <div id="box33">
                    <div class="QxpTextBox" style="left:2px;top:2px;">
                      <span class="para-NoStyle-127" style="top:-16.61px;">•</span>
                      <span class="para-NoStyle-127" style="left:49.01px;top:-16.61px;">   </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:-16.61px;">Patienten mit fortgeschrittener struktureller Herzkrankheit und schweren </span>
                      <span class="para-NoStyle-127" style="left:4659.85px;top:-16.61px;">         </span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:144.39px;">HI-Symptomen, trotz max. medikamentöser Therapie (inkl. Spezialisten-ein-</span>
                      <span class="para-NoStyle-127" style="left:170.08px;top:305.39px;">griffen).</span>
                    </div>
                  </div>
                </td>

              </tr>
            </tbody>
          </table>
        </div>
      </div>
    </div>
    <div id="box43">
      <img alt="NORME_onglet.png" class="qxpASDImage" height="44" src="assets/NORME_onglet.png" width="91" />
    </div>
    <div id="box44">
      <div id="box44_Props">
        <!-- bg -->
      </div>
      <div class="QxpTextBox" style="left:1.7px;top:1.7px;">
        <span class="para-NoStyle-127" style="top:-2.40px;">Der Update 2018 des Kapitels HERZINSUFFIZIENZ kann wie folgt eingesehen werden:</span>
        <span class="char-Normal-Local-374" style="top:168.89px;"></span>
        <span class="para-NoStyle-36" style="left:131.80px;top:164.61px;"> http://www.investimed.ch/HERZINSUFFIZIENZ_2018.pdf</span>
      </div>
    </div>
  </div>
  <script src="../assets/PressRunWidgets.js" type="text/javascript">
    /* Script */
  </script>
</body>

</html>

0 个答案:

没有答案