微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

与“ urllib.request”模块一起使用时,“ html2text”模块不起作用

如何解决与“ urllib.request”模块一起使用时,“ html2text”模块不起作用

我想获取网页的所有文本,因此我正在尝试将html2text模块与urllib.request模块一起使用-

import urllib.request 
import html2text
request_url = urllib.request.urlopen('https://dev.to/justdevasur/let-s-perform-google-search-with-python-2gpi') 
u=request_url.read()
print(html2text.html2text(u))
print('Done')

但是我遇到了以下错误-

Traceback (most recent call last):
  File "<stdin>",line 1,in <module>
  File "C:\Users\rauna\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\python38\site-packages\html2text\__init__.py",line 947,in html2text
    return h.handle(html)
  File "C:\Users\rauna\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\python38\site-packages\html2text\__init__.py",line 142,in handle
    self.Feed(data)
  File "C:\Users\rauna\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\python38\site-packages\html2text\__init__.py",line 138,in Feed
    data = data.replace("</' + 'script>","</ignore>")
TypeError: a bytes-like object is required,not 'str'

解决方法

由于错误提示<property name="hibernate.use_identifier_rollback" value="true" /> 需要一个Exception in thread "main" java.lang.NullPointerException at org.hibernate.persister.entity.AbstractEntityPersister.resetIdentifier(AbstractEntityPersister.java:5362) at org.hibernate.event.internal.DefaultDeleteEventListener.onDelete(DefaultDeleteEventListener.java:164) at org.hibernate.event.internal.DefaultDeleteEventListener.onDelete(DefaultDeleteEventListener.java:72) at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:110) at org.hibernate.internal.SessionImpl.fireDelete(SessionImpl.java:877) at org.hibernate.internal.SessionImpl.delete(SessionImpl.java:809) at org.hibernate.internal.SessionImpl.remove(SessionImpl.java:2714) at mock.Main.main(Main.java:20) 对象,因此您应该这样做:

@Entity
public class Foo {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Long id;

    private String name;
    //...
}

但这不仅抛出html2text,而且似乎bytes-like与Python3不兼容。例如,请参见此question

因此,我建议使用另一种方法,例如:

import urllib.request 
import html2text
request_url = urllib.request.urlopen('https://dev.to/justdevasur/let-s-perform-google-search-with-python-2gpi') 
print(html2text.html2text(request_url))
print('Done')

打印:403

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。