如何读取 urllib3 下载的 .net 文件？更新

如何解决如何读取 urllib3 下载的 .net 文件？更新

我正在使用 airports.net 从 github 下载文件 urllib3，并使用 networkx.read_pajek 将其作为图形对象读取，如下所示：

import urllib3
import networkx as nx


http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
f = http.request('GET',url)
G = nx.read_pajek(f.data(),encoding = 'UTF-8')
print(G)

然后有错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-7728c1228755> in <module>
     13 url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
     14 f = http.request('GET',url)
---> 15 G = nx.read_pajek(f.data(),encoding = 'UTF-8')
     16 print(G)
     17 

TypeError: 'bytes' object is not callable

能否请您详细说明如何做到这一点？

更新：如果我将 f.data() 更改为 f.data，则会出现一个新错误

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-e96ad6eb1bfb> in <module>()
      6 url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
      7 f = http.request('GET',url)
----> 8 G = nx.read_pajek(f.data,encoding = 'UTF-8')
      9 print(G)

<decorator-gen-781> in read_pajek(path,encoding)

4 frames
/usr/local/lib/python3.6/dist-packages/networkx/readwrite/pajek.py in <genexpr>(.0)
    159     for format information.
    160     """
--> 161     lines = (line.decode(encoding) for line in path)
    162     return parse_pajek(lines)
    163 

AttributeError: 'int' object has no attribute 'decode'

解决方法

从错误消息中可以推断出，也可以在 the docs 中读取，HTTPResponse.data 是 bytes 类型的属性而不是方法。因此，您需要 f.data 而不是 f.data() 才能检索该值。

更新

关于 AttributeError：正如可以在 network docs 中验证的那样，函数 read_pajek 期望它的第一个参数是包含数据的文件的路径，而不是实际数据。因此，您可以将字节转储到文件中，然后将该文件的路径作为参数传递。有几个选项：

只需使用硬编码的文件名。这可以说是最简单的，不需要额外的导入。

import urllib3
import networkx as nx

FILE_NAME = "/tmp/test.net"

http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
f = http.request('GET',url)

with open(FILE_NAME,"w") as fh:
    fh.write(f.data.decode())

G = nx.read_pajek(FILE_NAME,encoding='UTF-8')
print(f"G='{G}',G.size={G.size()}")

使用 tempfile 标准库模块为您管理文件（即给它一个随机名称，然后在不再使用后将其删除）。

import tempfile

import urllib3
import networkx as nx

http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
f = http.request('GET',url)

with tempfile.NamedTemporaryFile() as fh:
    fh.write(f.data)
    G = nx.read_pajek(fh.name,encoding='UTF-8')

print(f"G='{G}',G.size={G.size()}")

使用 io.BytesIO 或 io.StringIO（“内存文件”）。这会创建一个对象，该对象存储在内存 (RAM) 中，但具有类似于存储在磁盘上的常规文件的 API。访问存储在 RAM 中的内容要（快得多！）快得多，因此出于性能原因，这很有用。当然，您不能总是使用它，因为您只有这么多 RAM，但在您的特定情况下，您已经在内存中拥有数据，因此将其转储到磁盘将是巨大的浪费时间，只是为了让 networkx 将其读回内存。虽然在您的特定情况下，您可能不会注意到差异，因为您似乎只下载了 1 个不太大的文件一次，但也许将来会派上用场。

import io

import urllib3
import networkx as nx

http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
f = http.request('GET',url)

data = io.BytesIO(f.data)

G = nx.read_pajek(data,encoding = 'UTF-8')
print(f"G='{G}',G.size={G.size()}")

如何读取 urllib3 下载的 .net 文件？ 更新

如何解决如何读取 urllib3 下载的 .net 文件？ 更新

解决方法

更新

相关推荐

如何读取 urllib3 下载的 .net 文件？更新

如何解决如何读取 urllib3 下载的 .net 文件？更新