如何解决从 URL 下载文件并在 python 中压缩下载的文件
我正在尝试编写一个脚本来帮助我从 URL 下载文件。下载完所有文件后,我会将它们压缩。
现在例如,我可以从 URL 中包含文件名的 URL 下载文件
https://omextemplates.content.office.net/support/templates/en-us/tf16402488.dotx
我的代码是这样工作的。
first:我正在创建一个具有唯一名称的文件夹。我正在从调用者函数中获取文件夹名称
def createFolder(folder_name,parent_dir):
directory = folder_name
path = os.path.join(parent_dir,str(directory))
try :
os.mkdir(path)
return path
except OSError as error :
print(error)
return None
第二:我会将所有文件下载到此文件夹中。我从调用者那里获取文件夹路径。上面函数刚刚创建的文件夹路径
def download_file(url,folder_path,filename_to_be_download=''):
req = requests.get(url,stream = True)
try:
if filename_to_be_download == "":
return None
else:
filename = req.url[downloadUrl.rfind('/')+1:]
file_path = os.path.join(folder_path,filename_to_be_download)
with requests.get(url) as req:
with open(file_path,'wb') as f:
for chunk in req.iter_content(chunk_size=2024):
if chunk:
f.write(chunk)
return file_path
except Exception as e:
# print(e)
return None
第三:我使用唯一名称遍历文件夹中存在的所有下载文件,以创建它们的 zip
def run():
# for local
parent_dir = "D:/A/scrappers/tmp"
# create Folder with unique name '1234' inside the parent_directory
opportunity_Id = 1234
folder_created_path = createFolder(folder_name=opportunity_Id,parent_dir=parent_dir)
all_Urls = ['https://omextemplates.content.office.net/support/templates/en-us/tf16402488.dotx','https://procurement-notices.undp.org/view_file.cfm?doc_id=257280']
if folder_created_path :
# we created folder,we store all files in it
all_files_path = []
for eachUrl in all_Urls :
downloadUrl = eachUrl
req = requests.get(downloadUrl)
if req.status_code == 200 :
filename = req.url[downloadUrl.rfind('/') + 1 :]
# adding file path to all_files_path[] list. file just downloaded successfully
downloaded_file_path = download_file(downloadUrl,folder_created_path,filename_to_be_download=filename)
if downloaded_file_path :
all_files_path.append(downloaded_file_path)
else :
print("file not downloaded")
else :
print("status code is not 200")
# loop through all files that created and create zip
if len(all_files_path) > 0 :
# writing files to a zipfile
with ZipFile(os.path.join(parent_dir,f"{opportunity_Id}.zip"),'w',compression=zipfile.ZIP_DEFLATED) as zip :
# writing each file one by one
for file in all_files_path :
zip.write(file)
else :
print("no files to zip them")
else :
print("error while creating folder")
上述脚本适用于 all_Urls 列表中的第一个 URL。但它不适用于第二个 URL。我注意到第二个 URL 中没有文件名,如果我将发送使用它在浏览器 URL 中,文件将自动下载。如何从这样的 URL 下载文件并将它们与我的其他文件一起压缩
请看看enter link description here问题以及如何
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。