Question

我有一些tds，其中包含链接文字或链接图片，如下所示。当我收到BeautifulSoup的td文本时。文本如下：

如何制作"u'Language pack English"和u'SUSE Linux Enterprise Server for x86 5731SLX Customize'

等字符串

我试过了，但失败了。

u'Language pack\n      \t            \t      \t\t: English'`

u'SUSE Linux Enterprise Server for x86 5731SLX\n\t \t\t    \t\t\t\t\n\t\t\t\t\n\t\t        \t\t\t    \t\t\t\t\t\xa0\xa0\nCustomize'

的语言包

                                    : English

    <td>

            SUSE Linux Enterprise Server for x86 5731SLX


<br></br><img width="16" height="16" border="0" align="middle" src="//www.ibm.com/i/v14/icons/fw_bold.gif" title="Link icon" alt="Link icon"></img><a href="flowAction.wss?_eventId=customize&contextId=createProductContext_153180107005186045039023158181160233057201186025_5731SLX_2&_flowExecutionKey=_cC224C0EA-DCAC-4303-DDFD-32594E21C48B_k4D15950D-5056-DE43-BCBE-C73C228B3270"> … </a></td>



    </td>

Answer 1

您可以使用re.sub('\s+', ' ', inputstring)替换单个空格的额外空格。

正则表达式\s+匹配一个或多个（+）空格字符（\s）。

示例：

>>> inputstring = u'hello    world! this\n\n\t\t   \tis a \ntest'
>>> re.sub('\s+', ' ', inputstring)
u'hello world! this is a test'

如何在python中使用BeautifulSoup中的字符串处理链接文本或链接图像？

1 个答案: