这是我的XML:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><pdftsExtract><page number="0"><block height="10.425598" width="121.31714" xpos="75.384" ypos="695.5"><text>This is a test document.</text></block><text>
</text><block height="63.34558" width="462.63947" xpos="72.024" ypos="616.3"><text><italic>Portable Document Format </italic>(PDF) is a file format used to represent documents in a manner
independent pdf application software, hardware, and operating systems.Each PDF file
encapsulates a complete description of a fixed-layout flat document, including the text,
fonts, graphics, and other information needed to display it. <bold>In 1991, Adobe Systems co-
founder John Warnock outlined a system called "Camelot" that evolved into PDF.</bold></text></block><text>
</text><block height="89.31" width="466.7436" xpos="72.024" ypos="508.87"><text>While Adobe Systems made the PDF specification available free of charge in 1993, PDF remained a
proprietary format, controlled by Adobe, until it was officially released as an open standard on July
1,2008, and published by the International Organization for Standardization as ISO 32000-1:2008. In
2008, Adobe published a Public Patent <bold>License to ISO 32000-1 granting royalty-free rights for all
patents owned by Adobe that are necessary to make, use, sell and distribute PDF compliant
implementations.</bold></text></block><text>
</text><block height="41.76004" type="table" width="478.87598" xpos="66.62401" ypos="451.50998"><block height="13.920044" width="159.62599" xpos="66.62401" ypos="479.34998"><block height="8.279999" width="26.727844" xpos="72.024" ypos="482.71"><text>Name</text></block></block><text> </text><block height="13.920044" width="159.62" xpos="226.25" ypos="479.34998"><block height="8.279999" width="35.868988" xpos="231.65" ypos="482.71"><text>Address</text></block></block><text> </text><block height="13.920044" width="159.63" xpos="385.87" ypos="479.34998"><block height="8.279999" width="31.651733" xpos="391.27" ypos="482.71"><text>Mobile</text></block></block><text>
</text><block height="13.919983" width="159.62599" xpos="66.62401" ypos="465.43"><block height="8.279999" width="24.243843" xpos="72.024" ypos="468.79"><text>Richa</text></block></block><text> </text><block height="13.919983" width="159.62" xpos="226.25" ypos="465.43"><block height="8.279999" width="44.347687" xpos="231.65" ypos="468.79"><text>Velachery</text></block></block><text> </text><block height="13.919983" width="159.63" xpos="385.87" ypos="465.43"><block height="8.279999" width="50.198975" xpos="391.27" ypos="468.79"><text>123456789</text></block></block><text>
</text><block height="13.920013" width="159.62599" xpos="66.62401" ypos="451.50998"><block height="8.279999" width="38.88288" xpos="72.024" ypos="454.87"><text>Bhuvana</text></block></block><text> </text><block height="13.920013" width="159.62" xpos="226.25" ypos="451.50998"><block height="8.279999" width="36.49826" xpos="231.65" ypos="454.87"><text>Chennai</text></block></block><text> </text><block height="13.920013" width="159.63" xpos="385.87" ypos="451.50998"><block height="8.279999" width="50.198975" xpos="391.27" ypos="454.87"><text>987654321</text></block></block></block></page></pdftsExtract>
我想要关注html(粗体,斜体,表格式):
这是一份测试文件。
可移植文档格式(PDF)是一种文件格式,用于以独立于pdf应用程序的软件,硬件和操作系统的方式表示文档。每个PDF文件都封装了固定布局的完整描述平面文档,包括显示它所需的文本,字体,图形和其他信息。 1991年,Adobe Systems的联合创始人John Warnock概述了一个名为&#34; Camelot&#34;它演变成PDF格式。
虽然Adobe Systems在1993年免费提供PDF规范,但PDF仍然是由Adobe控制的专有格式,直到它于2008年7月1日作为开放标准正式发布,并由国际标准化组织发布。作为ISO 32000-1:2008。 2008年,Adobe发布了ISO 32000-1的公开专利许可,授予Adobe拥有的所有专利的免版权,这些专利是制作,使用,销售和分发PDF兼容的实施所必需的。
名称AddressMobile RichaVelachery123456789 BhuvanaChennai987654321