来自Html字符串的数据提取

时间:2018-03-28 12:29:19

标签: html regex parsing extract

我一直在寻找从电子邮件正文中收到的html中提取的一些信息。在提取数据之前,我已经将html清理为仅具有最小的基本html代码&没有属性样式空行和所有。

我看到一些mailparser使用gui来选择我需要通过创建新模板来提取的字段。我还发现,如果html中有任何微​​小的变化,它会很聪明并像以前一样提取数据

我的问题是这些网站如何通过gui创建模板(选择我需要的文本)。还有任何开源项目或它可以帮助我的任何库。

示例:需要提取预订号,pnr,日期..首选GUI创建模板。

<table>
    <tbody>
        <tr>
            <td>Booking No.: 5903154789</td>
        </tr>
    </tbody>
</table>


<table>
    <tbody>
        <tr>
            <td>Download the Trip.com app to track your flight status and check booking details on the move.</td>
        </tr>
        <tr>
            <td>FAQs</td>
        </tr>
        <tr>
            <td>How can I refund my flight ticket?</td>
        </tr>
        <tr>
            <td>If you need to refund your flight ticket after the ticket has been issued, please sign in and select My bookings, then Flights, click on the relevant order number to open the booking details page, click the Refund button to apply for a ticket refund according to the website instructions. A cancellation fee might apply which depends on the policy of the airline. If you reserved your flight ticket as a guest, you can search your booking through the email address which you used for your booking and apply for a ticket refund according to the website instructions.</td>
        </tr>
        <tr>
            <td>How can I change my flight ticket?</td>
        </tr>
        <tr>
            <td>If you need to modify your ticket after it has been issued, please contact one of our Trip.com customer service representatives. A change fee might apply which is dependent on the policy of the airline.</td>
        </tr>
        <tr>
            <td>How can I check the flight status?</td>
        </tr>
        <tr>
            <td>You can check the flight status through "Get Flight Status" in the "Flights tools" at the bottom of the homepage of Flights. You can also download our Trip.com App to check your flight's status by clicking the button "Flight Status" on the homepage.</td>
        </tr>
        <tr>
            <td>Contact Us</td>
        </tr>
        <tr>
            <td>United States : 833 896 0077 24/7</td>
        </tr>
        <tr>
            <td>China : 400 828 8966 24/7</td>
        </tr>
        <tr>
            <td>Other Locations : +86 21 3210 4669 24/7</td>
        </tr>
        <tr>
            <td>Great deals with reliable service</td>
        </tr>
        <tr>
            <td>Thank you for choosing Trip.comCustomer Service Department</td>
        </tr>
        <tr>
            <td>Do not forward this mail as it contains your personal information and booking details.</td>
        </tr>
        <tr>
            <td>Copyright © 1999-2018 Trip.com All rights reserved</td>
        </tr>
        <tr>
            <td>Using Trip.comâs website means that you agree with Trip.comâs Privacy Policy.</td>
        </tr>
    </tbody>
</table>


<table>
    <tbody>
        <tr>
            <td>Flight Booking Confirmed</td>
        </tr>
        <tr>
            <td>
                <strong>Dear Customer</strong>,
                <p>Your flight booking has been confirmed and your tickets have been issued.</p>
                <p>If you'd like to change or cancel your booking, the Trip.com app makes it easy.</p>
                <p>You will find your itinerary and e-receipt attached. We advise you print out your itinerary and take it with you to ensure your trip goes as smoothly as possible.</p>
            </td>
        </tr>
        <tr>
            <td>
                <table>
                    <tbody>
                        <tr>
                            <td>Booking No.</td>
                            <td>5903154789</td>
                        </tr>
                        <tr>
                            <td>Booked On</td>
                            <td>25 Mar 2018 12:32</td>
                        </tr>
                        <tr>
                            <td>Airline Booking Reference</td>
                            <td>C9LHJQ</td>
                        </tr>
                    </tbody>
                </table>
            </td>
        </tr>
        <tr>
            <td>
                <strong>Flight Details</strong>(DPS - SIN)
            </td>
        </tr>
        <tr>
            <td>Bali - SingaporeScoot · TR281</td>
        </tr>
        <tr>
            <td>
                <table>
                    <tbody>
                        <tr>
                            <td>3 May 2018 10:50</td>
                            <td>DPS</td>
                            <td>Ngurah Rai Airport I</td>
                        </tr>
                        <tr>
                            <td>3 May 2018 13:25</td>
                            <td>SIN</td>
                            <td>Changi Airport T2</td>
                        </tr>
                        <tr>
                            <td>
                                <strong>Baggage Allowance</strong>
                                <p>
                                    <strong>[FREE]</strong>No free baggage allowance.Please contact airline for detailed baggage regulations.
                                </p>
                            </td>
                        </tr>
                    </tbody>
                </table>
            </td>
        </tr>
        <tr>
            <td>Passenger</td>
        </tr>
        <tr>
            <td>
                <table>
                    <tbody>
                        <tr>
                            <td>Name</td>
                            <td>Ticket Number</td>
                        </tr>
                        <tr>
                            <td>SOMANATH/MAMATHA</td>
                            <td>C9LHJQ</td>
                        </tr>
                        <tr>
                            <td>YADARANGI/SOMANATH</td>
                            <td>C9LHJQ</td>
                        </tr>
                    </tbody>
                </table>
            </td>
        </tr>
        <tr>
            <td>Click here to view date change and cancellation policies.</td>
        </tr>
        <tr>
            <td>For more information, please check the attachments or view your booking in more detail on the Trip.com website or app.</td>
        </tr>
        <tr>
            <td>
                <table>
                    <tbody>
                        <tr>
                            <td>Important information</td>
                        </tr>
                        <tr>
                            <td>â¢</td>
                            <td>All departure/arrival times and dates are in local time.</td>
                        </tr>
                        <tr>
                            <td>â¢</td>
                            <td>Tickets must be used in the sequence set out in the itinerary.</td>
                        </tr>
                        <tr>
                            <td>â¢</td>
                            <td>Please arrive at the airport at least 2 hours before departure to ensure you have enough time to check in.</td>
                        </tr>
                        <tr>
                            <td>â¢</td>
                            <td>Your ID must be valid for at least 6 months beyond the date you complete your itinerary.</td>
                        </tr>
                        <tr>
                            <td>â¢</td>
                            <td>A transit visa may be required if you need to transfer in a third country. We recommend you confirm visa details with the embassy of the relevant country.</td>
                        </tr>
                        <tr>
                            <td>â¢</td>
                            <td>If you have only booked a one-way ticket and are travelling on a short-term business/tourism visa, we recommend you purchase a return ticket as soon as possible. Failure to do so may result in denial of check-in, entry, or exit.</td>
                        </tr>
                    </tbody>
                </table>
            </td>
        </tr>
    </tbody>
</table>


<table>
    <tbody>
        <tr>
            <td>Booking No.</td>
            <td>5903154789</td>
        </tr>
        <tr>
            <td>Booked On</td>
            <td>25 Mar 2018 12:32</td>
        </tr>
        <tr>
            <td>Airline Booking Reference</td>
            <td>C9LHJQ</td>
        </tr>
    </tbody>
</table>


<table>
    <tbody>
        <tr>
            <td>3 May 2018 10:50</td>
            <td>DPS</td>
            <td>Ngurah Rai Airport I</td>
        </tr>
        <tr>
            <td>3 May 2018 13:25</td>
            <td>SIN</td>
            <td>Changi Airport T2</td>
        </tr>
        <tr>
            <td>
                <strong>Baggage Allowance</strong>
                <p>
                    <strong>[FREE]</strong>No free baggage allowance.Please contact airline for detailed baggage regulations.
                </p>
            </td>
        </tr>
    </tbody>
</table>


<table>
    <tbody>
        <tr>
            <td>Name</td>
            <td>Ticket Number</td>
        </tr>
        <tr>
            <td>SOMANATH/MAMATHA</td>
            <td>C9LHJQ</td>
        </tr>
        <tr>
            <td>YADARANGI/SOMANATH</td>
            <td>C9LHJQ</td>
        </tr>
    </tbody>
</table>


<table>
    <tbody>
        <tr>
            <td>Important information</td>
        </tr>
        <tr>
            <td>â¢</td>
            <td>All departure/arrival times and dates are in local time.</td>
        </tr>
        <tr>
            <td>â¢</td>
            <td>Tickets must be used in the sequence set out in the itinerary.</td>
        </tr>
        <tr>
            <td>â¢</td>
            <td>Please arrive at the airport at least 2 hours before departure to ensure you have enough time to check in.</td>
        </tr>
        <tr>
            <td>â¢</td>
            <td>Your ID must be valid for at least 6 months beyond the date you complete your itinerary.</td>
        </tr>
        <tr>
            <td>â¢</td>
            <td>A transit visa may be required if you need to transfer in a third country. We recommend you confirm visa details with the embassy of the relevant country.</td>
        </tr>
        <tr>
            <td>â¢</td>
            <td>If you have only booked a one-way ticket and are travelling on a short-term business/tourism visa, we recommend you purchase a return ticket as soon as possible. Failure to do so may result in denial of check-in, entry, or exit.</td>
        </tr>
    </tbody>
</table>

在线解析器如下:

https://mailparser.io/

https://parser.zapier.com/

https://parseur.com/

编辑: 目前我使用 imangazaliev / didom(PHP)创建了手动模板 指向excact节点元素来获取数据,但是对于这么多模板来说太难了,所以寻找其他模板。

0 个答案:

没有答案