Question

这是拆分标题和作者的最佳方法吗？

我需要从文本块中拆分出书名和书作者，并将其放入元组列表中，但发现很难将其包裹起来。这是文本块的示例：

The Coddling of the American Mind: How Good Intentions and Bad Ideas Are Setting Up a Generation for Failure
by Greg Lukianoff & Jonathan Haidt

The Four Agreements: A Practical Guide to Personal Freedom (A Toltec Wisdom Book)
by Don Miguel Ruiz

这是使用BeautifulSoup从html提取标题和作者的Python脚本。

result = url_connection(url_list[3]) #prints the first link for testing
x = result.find_all("h3", {"class" : "book-title"})
for a in x:
    list_of_books.append(tuple((a.text).replace('\n', '').split('by')))

我得到的结果是正确的：

[('The Coddling of the American Mind: How Good Intentions and Bad Ideas Are Setting Up a Generation for Failure', 'Greg Lukianoff & Jonathan Haidt'), ('The Four Agreements: A Practical Guide to Personal Freedom (A Toltec Wisdom Book), 'Don Miguel Ruiz')]

但是，如果标题中有"by"，那么我的代码就搞砸了。最好的方法是什么？

Answer 1

使用split('by ')代替rsplit(' by ', 1)。

这将从字符串的 end 开始寻找by，并在拆分后停止。

我在by之前加了一个空格，以防作者是Dan Jacoby and John Doe之类的人。

有没有更好的方法来区分书名和作者？

1 个答案: