String to String []的字符串

时间:2013-04-16 14:16:03

标签: java arrays string

我需要拆分String并获取String[]个字词。我试过这个:

String[] plain = plainText.split(" ,;<>/[(!)*=]");

但在我的情况下,这不起作用。拆分后,数组plain仍然只有一个值,它是字符串plainText中的整个字符串。我的字符串看起来像这样:

<table class="content" border="0" cellpadding="0" cellspacing="0" style="width:540px;" bgcolor="#ffffff">
            <tr>
                <td align="left" valign="top">
                    <font color="#666666" face="Arial, Verdana" size="1">
                    eBay Inc.<br />
                    2145 Hamilton Avenue<br />
                    San Jose, California 95125<br /><br />

                    Designated trademarks and brands are the property of their respective owners. eBay and the eBay logo are trademarks of eBay Inc.
                    <br /><br />

                    <strong>&copy; 2013 eBay Inc. All Rights Reserved</strong><br /><br />


                    eBay Inc. sent this e-mail to you at maximkr@gmail.com because you opted in to the eBay Deals Daily Alert campaign by signing up at ebay.com/deals.<br /><br />


                    Pricing: We compared the selling price for the featured Deals items on eBay to the List Price for the item. The List price is the price (excluding shipping and handling fees) the seller of the item has provided at which the same item, or one that is nearly identical to it, is being offered for sale or has been offered for sale in the recent past. The price may be the seller's own price elsewhere or another seller's price. The "% off" simply signifies the calculated percentage difference between seller-provided List Price and the seller's price for the eBay Deals item. If you have any questions related to the pricing and/or discount offered in eBay Deals, please contact the seller. All items subject to availability.<br /><br />

                    If you wish to unsubscribe from eBay Deals email alerts, please <a href="http://dailydeal.ebay.com/unsubscribe.jsp?s=4IwA&i=883690252203">click here</a>.
                    Please note that you are only opting out of the eBay Deals email alerts. If you are an eBay customer and wish to change your other eBay Notification Preferences, please log in to My eBay by <a href="http://l.deals.ebay.com/u.d?R4GrxGghJ4SpZccF_r3SS=21801">clicking here</a>. Please note that it may take up to 10 days to process changes to your eBay Notification Preferences. <br /><br />

                    Visit our <a href="http://l.deals.ebay.com/u.d?f4GrxGghJ4SpZccF_r3Sf=21811">Privacy Policy</a> and <a href="http://l.deals.ebay.com/u.d?KYGrxGghJ4SpZccF_r3SY=21821">User Agreement</a> if you have any questions.<br /><br />

                    </font>
                </td>

这是已解析的电子邮件的一部分。那么如何将这个文本转换成一个单词数组?

4 个答案:

答案 0 :(得分:3)

这个正则表达式是错误的,因为它的一些字符是正则表达式控制字符(例如[(*等)并且必须被转义才能用作拆分分隔符,整个字符组也必须包含在[]:

String[] plain = plainText.split("[ ,;<>/\\[\\(!\\)\\*=\\]]");

详细了解Java regex here

修改:要跟进CPerkins的评论,您还可以使用此正则表达式:

String[] plain = plainText.split("[\\s^\\W]+");

它的作用是分割所有空格字符和所有非字字符,我认为这就是你想要的。

NB :以上只是您问题的直接答案,有更好的方法来阅读/解析HTML。

答案 1 :(得分:0)

您可以使用Scanner类。您可以使用

阅读单词
while(scanner.hasNext()){}

类型构造。

link:Scanner

答案 2 :(得分:0)

    String noTags = htmlString.replaceAll("\\<.*?\\>", "");
    String clearTxt = noTags.replaceAll("[ \t\n.,!;\\(\\)]+", " ");
    String[] words = clearTxt.split(" ");

答案 3 :(得分:0)

Apache StringUtils.split的某些变体呢?