如何解决如何使用python客户端api从GCS云存储桶下载文件夹中的所有文件?
如何使用python客户端api从GCS云存储桶下载文件夹中的所有文件? .docx 和 .pdf 等文件
解决方法
使用下载的凭据文件创建客户端,请参阅 documentation 此文档告诉您导出文件位置,但我个人更喜欢下面使用的方法,因为它允许在同一应用程序中使用不同的凭据。
恕我直言,将每个服务帐户可以访问的内容分开可将安全性提高十倍。在同一个应用中处理不同的项目时,它也很有用。
请注意,您还必须授予 serviceaccount 权限 Storage Object Viewer 或具有更多权限的权限。
出于安全考虑,始终使用最不需要的
requirements.txt
google-cloud-storage
main.py
from google.cloud import storage
from os import makedirs
# use a downloaded credentials file to create the client,see
# https://cloud.google.com/storage/docs/reference/libraries#setting_up_authentication
# this docs tells you to export the file location,but I personally
# prefer the method used below as it allows for different credentials
# within the same application.
# IMHO separation of what each serviceaccount can access increases
# security by tenfold. It's also usefull when dealing with different
# projects in the same app.
#
#
# Note that you'll also have to give the serviceaccount the
# permission "Storage Object Viewer",or one with more permissions.
# Always use the least needed to due to security considerations
# https://cloud.google.com/storage/docs/access-control/iam-roles
cred_json_file_path = 'path/to/file/credentials.json'
client = storage.Client.from_service_account_json(cred_json_file_path)
def download_blob(bucket: storage.Bucket,remotefile: str,localpath: str='.'):
"""downloads from remotepath to localpath"""
localrelativepath = '/'.join(remotefile.split('/')[:-1])
totalpath = f'{localpath}/{localrelativepath}'
filename = f'{localpath}/{remotefile}'
makedirs(totalpath,exist_ok=True)
print(f'Current file details:\n remote file: {remotefile}\n local file: {filename}\n')
blob = storage.Blob(remotefile,bucket)
blob.download_to_filename(filename,client=client)
def download_blob_list(bucketname: str,bloblist: list,localpath: str='.'):
"""downloads a list of blobs to localpath"""
bucket = storage.Bucket(client,name=bucketname)
for blob in bloblist:
download_blob(bucket,blob,localpath)
def list_blobs(bucketname: str,remotepath: str=None,filetypes: list=[]) -> list:
"""returns a list of blobs filtered by remotepath and filetypes
remotepath and filetypes are optional"""
result = []
blobs = list(client.list_blobs(bucketname,prefix=remotepath))
for blob in blobs:
name = str(blob.name)
# skip "folder" names
if not name.endswith('/'):
# do we need to filter file types?
if len(filetypes) > 0:
for filetype in filetypes:
if name.endswith(filetype):
result.append(name)
else:
result.append(name)
return result
bucketname = 'bucketnamegoeshere'
foldername = 'foldernamegoeshere'
filetypes = ['.pdf','.docx'] # list of extentions to return
bloblist = list_blobs(bucketname,remotepath=foldername,filetypes=filetypes)
# I'm just using the bucketname for localpath for download location.
# should work with any path
download_blob_list(bucketname,bloblist,localpath=bucketname)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。