微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

通过PythonJupyter从Google云端存储中读取.gz文件

如何解决通过PythonJupyter从Google云端存储中读取.gz文件

我正在尝试通过Jupyter笔记本上的Python从Google Cloud存储中读取.gz文件

我的第一个代码出错。

TypeError:无法将str连接为字节

from google.cloud import storage
import pandas as pd
from io import StringIO

client = storage.Client()
bucket = client.get_bucket("nttcomware")
blob = bucket.get_blob(f"test.csv.gz")
df = pd.read_csv(s,compression='gzip',float_precision="high")
df.head()

第二个代码我得到第二个错误

UnicodeDecodeError:“ utf-8”编解码器无法解码位置1的字节0x8b:无效的起始字节

from google.cloud import storage
import pandas as pd
from io import StringIO

client = storage.Client()
bucket = client.get_bucket("nttcomware")
blob = bucket.get_blob(f"test.csv.gz")
bt = blob.download_as_string()
s = str(bt,"utf-8")
s = StringIO(s)
df = pd.read_csv(s,float_precision="high")
df.head()

请提出建议。

解决方法

我很幸运地自己解决了。 希望对其他人有帮助。

client = storage.Client()

# get the bucket
bucket = client.get_bucket("nttcomware")

# get the blob object
blob_name = "test.csv.gz"
blob = bucket.get_blob(blob_name)

# convert blob into string and consider as BytesIO object. Still compressed by gzip
data = io.BytesIO(blob.download_as_string())

# open gzip into csv
with gzip.open(data) as gz:
    #still byte type string
    file = gz.read()
    # erase the .gz extension and get the blob object
    blob_decompress = bucket.blob(blob_name.replace('.gz',''))
    # convert into byte type again
    blob_decompress = blob_decompress.download_as_string()
    # decode the byte type into string by utf-8
    blob_decompress = blob_decompress.decode('utf-8')
    # StringIO object
    s = StringIO(blob_decompress)
    

df = pd.read_csv(s,float_precision="high")
df.head()

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。