我想获取网页的所有文本,因此我正在尝试将html2text模块与urllib.request模块一起使用-
import urllib.request
import html2text
request_url = urllib.request.urlopen('https://dev.to/justdevasur/let-s-perform-google-search-with-python-2gpi')
u=request_url.read()
print(html2text.html2text(u))
print('Done')
但是我遇到了以下错误-
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\rauna\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\html2text\__init__.py", line 947, in html2text
return h.handle(html)
File "C:\Users\rauna\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\html2text\__init__.py", line 142, in handle
self.feed(data)
File "C:\Users\rauna\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\html2text\__init__.py", line 138, in feed
data = data.replace("</' + 'script>", "</ignore>")
TypeError: a bytes-like object is required, not 'str'
答案 0 :(得分:0)
由于错误提示<property name="hibernate.use_identifier_rollback" value="true" />
需要一个Exception in thread "main" java.lang.NullPointerException
at org.hibernate.persister.entity.AbstractEntityPersister.resetIdentifier(AbstractEntityPersister.java:5362)
at org.hibernate.event.internal.DefaultDeleteEventListener.onDelete(DefaultDeleteEventListener.java:164)
at org.hibernate.event.internal.DefaultDeleteEventListener.onDelete(DefaultDeleteEventListener.java:72)
at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:110)
at org.hibernate.internal.SessionImpl.fireDelete(SessionImpl.java:877)
at org.hibernate.internal.SessionImpl.delete(SessionImpl.java:809)
at org.hibernate.internal.SessionImpl.remove(SessionImpl.java:2714)
at mock.Main.main(Main.java:20)
对象,因此您应该这样做:
@Entity
public class Foo {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE)
private Long id;
private String name;
//...
}
但这不仅抛出html2text
,而且似乎bytes-like
与Python3不兼容。例如,请参见此question。
因此,我建议使用另一种方法,例如:
import urllib.request
import html2text
request_url = urllib.request.urlopen('https://dev.to/justdevasur/let-s-perform-google-search-with-python-2gpi')
print(html2text.html2text(request_url))
print('Done')
打印:403