微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何使用python客户端api从GCS云存储桶下载文件夹中的所有文件?

如何解决如何使用python客户端api从GCS云存储桶下载文件夹中的所有文件?

如何使用python客户端api从GCS云存储桶下载文件夹中的所有文件? .docx 和 .pdf 等文件

解决方法

使用下载的凭据文件创建客户端,请参阅 documentation 此文档告诉您导出文件位置,但我个人更喜欢下面使用的方法,因为它允许在同一应用程序中使用不同的凭据。

恕我直言,将每个服务帐户可以访问的内容分开可将安全性提高十倍。在同一个应用中处理不同的项目时,它也很有用。

请注意,您还必须授予 serviceaccount 权限 Storage Object Viewer 或具有更多权限的权限。
出于安全考虑,始终使用最不需要的

requirements.txt

google-cloud-storage

ma​​in.py

from google.cloud import storage
from os import makedirs


# use a downloaded credentials file to create the client,see
# https://cloud.google.com/storage/docs/reference/libraries#setting_up_authentication
# this docs tells you to export the file location,but I personally 
# prefer the method used below as it allows for different credentials 
# within the same application.
# IMHO separation of what each serviceaccount can access increases
# security by tenfold. It's also usefull when dealing with different
# projects in the same app.
# 
#
# Note that you'll also have to give the serviceaccount the
# permission "Storage Object Viewer",or one with more permissions.
# Always use the least needed to due to security considerations
# https://cloud.google.com/storage/docs/access-control/iam-roles


cred_json_file_path = 'path/to/file/credentials.json'
client = storage.Client.from_service_account_json(cred_json_file_path)


def download_blob(bucket: storage.Bucket,remotefile: str,localpath: str='.'):
    """downloads from remotepath to localpath"""
    localrelativepath = '/'.join(remotefile.split('/')[:-1])
    totalpath = f'{localpath}/{localrelativepath}'
    filename = f'{localpath}/{remotefile}'
    makedirs(totalpath,exist_ok=True)
    print(f'Current file details:\n  remote file: {remotefile}\n  local file:  {filename}\n')
    blob = storage.Blob(remotefile,bucket)
    blob.download_to_filename(filename,client=client)


def download_blob_list(bucketname: str,bloblist: list,localpath: str='.'):
    """downloads a list of blobs to localpath"""
    bucket = storage.Bucket(client,name=bucketname)
    for blob in bloblist:
        download_blob(bucket,blob,localpath)


def list_blobs(bucketname: str,remotepath: str=None,filetypes: list=[]) -> list:
    """returns a list of blobs filtered by remotepath and filetypes
    remotepath and filetypes are optional"""
    result = []
    blobs = list(client.list_blobs(bucketname,prefix=remotepath))
    for blob in blobs:
        name = str(blob.name)
        # skip "folder" names
        if not name.endswith('/'):
            # do we need to filter file types?
            if len(filetypes) > 0:
                for filetype in filetypes:
                    if name.endswith(filetype):
                        result.append(name)
            else:
                result.append(name)
    return result


bucketname = 'bucketnamegoeshere'
foldername = 'foldernamegoeshere'
filetypes = ['.pdf','.docx'] # list of extentions to return
bloblist = list_blobs(bucketname,remotepath=foldername,filetypes=filetypes)

# I'm just using the bucketname for localpath for download location.
# should work with any path
download_blob_list(bucketname,bloblist,localpath=bucketname)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?