我有一个使用BeautifulSoup从网站上提取的值列表。它看起来像这样:
tables_values1 = soup.find_all('td',attrs={'class':'x1'})
print(tables_values1)
输出:[123value1,123value2,“123value3] (注意没有“或”)
我正在尝试使用以下内容(我也在stackexchange上找到)切掉前x个字符:
tables_values = [x[2:] for x in tables_values1]
然而,这会返回:
TypeError:不可用类型:'slice'
任何人都可以帮助弄清楚为什么会发生这种情况以及如何解决这个问题?非常感谢!
编辑:现在请告诉我这是否是有效的清单!
编辑3:按照以下要求打印精确的repr:
[<td class="views-field views-field-field-category-value-2018">136 </td>, <td class="views-field views-field-field-category-value-2018">SFD </td>, <td class="views-field views-field-field-category-value-2018">136 </td>, <td class="views-field views-field-field-category-value-2018">$33,657,146 </td>, <td class="views-field views-field-field-category-value-2018">9.7 </td>, <td class="views-field views-field-field-category-value-2018">$33,657,146 </td>, <td class="views-field views-field-field-category-value-2018">61 </td>, <td class="views-field views-field-field-category-value-2018">34 </td>, <td class="views-field views-field-field-category-value-2018">5 </td>, <td class="views-field views-field-field-category-value-2018">61 </td>, <td class="views-field views-field-field-category-value-2018">34 </td>, <td class="views-field views-field-field-category-value-2018">5 </td>, <td class="views-field views-field-field-category-value-2018">5 </td>, <td class="views-field views-field-field-category-value-2018">95 </td>]
<td class="views-field views-field-field-category-value-2018">136 </td>
答案 0 :(得分:1)
这些是列表中的BeautifulSoup标记对象,而不是字符串。你试图将它们切成片状。你真的应该使用作为标签而不是尝试进行字符串操作;例如,如果你试图在标签之间获取文本,那就是
contents = [x.string for x in tables_values1]
其中string
attribute是获取标记的单字符串子项的帮助器,如果它有一个。
如果您真的想通过字符串操作执行任务而不是使用BeautifulSoup界面,则可以将标记对象转换为字符串,包括<td class="..."></td>
部分:
strings = [str(x) for x in tables_values1]
然后你可以随意切割字符串。