微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

从 URL 下载文件并在 python 中压缩下载的文件

如何解决从 URL 下载文件并在 python 中压缩下载的文件

我正在尝试编写一个脚本来帮助我从 URL 下载文件。下载完所有文件后,我会将它们压缩。

现在例如,我可以从 URL 中包含文件名的 URL 下载文件 https://omextemplates.content.office.net/support/templates/en-us/tf16402488.dotx

我的代码是这样工作的。

first:我正在创建一个具有唯一名称文件夹。我正在从调用函数获取文件名称

def createFolder(folder_name,parent_dir):
    directory = folder_name
    path = os.path.join(parent_dir,str(directory))

    try :
        os.mkdir(path)
        return path
    except OSError as error :
        print(error)
    

    return None

第二:我会将所有文件下载到此文件夹中。我从调用者那里获取文件夹路径。上面函数刚刚创建的文件夹路径

def download_file(url,folder_path,filename_to_be_download=''):

    req = requests.get(url,stream = True)
    try:
        if filename_to_be_download == "":
            return None
        else:
            filename = req.url[downloadUrl.rfind('/')+1:]

            file_path = os.path.join(folder_path,filename_to_be_download)
            with requests.get(url) as req:
                with open(file_path,'wb') as f:
                    for chunk in req.iter_content(chunk_size=2024):
                        if chunk:
                            f.write(chunk)
                return file_path
    except Exception as e:
        # print(e)
        return None

第三:我使用唯一名称遍历文件夹中存在的所有下载文件,以创建它们的 zip

def run():
    # for local
    parent_dir = "D:/A/scrappers/tmp"

    # create Folder with unique name '1234' inside the parent_directory
    opportunity_Id = 1234
    folder_created_path = createFolder(folder_name=opportunity_Id,parent_dir=parent_dir)
    all_Urls = ['https://omextemplates.content.office.net/support/templates/en-us/tf16402488.dotx','https://procurement-notices.undp.org/view_file.cfm?doc_id=257280']

    if folder_created_path :
        # we created folder,we store all files in it
        all_files_path = []
        for eachUrl in all_Urls :
            downloadUrl = eachUrl
            req = requests.get(downloadUrl)
            if req.status_code == 200 :
                filename = req.url[downloadUrl.rfind('/') + 1 :]

                # adding file path to all_files_path[] list. file just downloaded successfully
                downloaded_file_path = download_file(downloadUrl,folder_created_path,filename_to_be_download=filename)
                if downloaded_file_path :
                    all_files_path.append(downloaded_file_path)
                else :
                    print("file not downloaded")
            else :
                print("status code is not 200")

        # loop through all files that created and create zip
        if len(all_files_path) > 0 :
            # writing files to a zipfile
            with ZipFile(os.path.join(parent_dir,f"{opportunity_Id}.zip"),'w',compression=zipfile.ZIP_DEFLATED) as zip :
                # writing each file one by one
                for file in all_files_path :
                    zip.write(file)
        else :
            print("no files to zip them")
    else :
        print("error while creating folder")

 

上述脚本适用于 all_Urls 列表中的第一个 URL。但它不适用于第二个 URL。我注意到第二个 URL 中没有文件名,如果我将发送使用它在浏览器 URL 中,文件自动下载。如何从这样的 URL 下载文件并将它们与我的其他文件一起压缩

请看看enter link description here问题以及如何

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。