微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

将“.eml”文件传输到 Google Cloud Platform 时出现 UnicodeEncodeErrorLinux 上的 gsutil v4.6.1

如何解决将“.eml”文件传输到 Google Cloud Platform 时出现 UnicodeEncodeErrorLinux 上的 gsutil v4.6.1

在使用 gsutil cp 命令将文件从 Linux 系统传输到 Google Cloud Platform 时,它在尝试处理其内容(不仅仅是文件名!)时在一些旧的“.eml”文件中失败包含未以 Unicode 编码的非英文字符。

尝试的命令是:

gsutil cp "/home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml" gs://darsen_backup_monthly/

错误信息是:

UnicodeEncodeError: 'ascii' codec can't encode character '\udca8' in position 22881: ordinal not in range(128)

gsutil rsync 给出了一个非常相似的错误。位置 22881 (0x5961) 位于多部分电子邮件文件的末尾。以下显示十六进制转储的文件内容

00005960: 20a8 43a4 d1b3 a320 5961 686f 6f21 a95f   .C.... Yahoo!._
00005970: bcaf 203e 2020 7777 772e 7961 686f 6f2e  .. >  www.yahoo.
00005980: 636f 6d2e 7477 0d0a                      com.tw..

我们在位置 0x5961 看到字节“0xa8”,这是错误消息指出的问题根源。出于某种原因,gsutil 试图对文本进行编码。在支持汉字的终端打开文件时,我们看到:

< 每天都 Yahoo!奇摩 >  www.yahoo.com.tw

Big-5 编码的第一个汉字“每”是 0xa843。一个简单的解决方法是将文件扩展名重命名为“.eml”以外的其他名称,例如“.eml.bak”,以便 gsutil 不处理文件内容。遗憾的是,在进行批量传输时,很难提前知道此类非英文字文件的存在,并且整个过程可能会多次停止。

以下是完整的错误信息:

darsenlu@devmodel:~/Home$ gsutil cp "/home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml" gs://darsen_backup_monthly/
copying file:///home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml [Content-Type=message/rfc822]...
Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil",line 21,in <module>
    gsutil.RunMain()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil.py",line 122,in RunMain
    sys.exit(gslib.__main__.main())
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py",line 444,in main
    user_project=user_project)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py",line 780,in _RunNamedCommandAndHandleExceptions
    _HandleUnkNownFailure(e)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py",line 639,in _RunNamedCommandAndHandleExceptions
    user_project=user_project)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command_runner.py",line 411,in RunNamedCommand
    return_code = command_inst.runcommand()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py",line 1124,in runcommand
    seek_ahead_iterator=seek_ahead_iterator)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py",line 1525,in Apply
    arg_checker,should_return_results,fail_on_error)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py",line 1596,in _SequentialApply
    worker_thread.PerformTask(task,self)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py",line 2316,in PerformTask
    results = task.func(cls,task.args,thread_state=self.thread_gsutil_api)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py",line 709,in _copyFuncWrapper
    preserve_posix=cls.preserve_posix_attrs)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py",line 924,in copyFunc
    preserve_posix=preserve_posix)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py",line 3957,in Performcopy
    gzip_encoded=gzip_encoded)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py",line 2250,in _UploadFiletoObject
    parallel_composite_upload,logger)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py",line 2066,in _DelegateUploadFiletoObject
    elapsed_time,uploaded_object = upload_delegate()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py",line 2227,in CallNonResumableupload
    gzip_encoded=gzip_encoded_file)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py",line 1762,in _UploadFiletoObjectNonResumable
    gzip_encoded=gzip_encoded)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py",line 388,in Uploadobject
    gzip_encoded=gzip_encoded)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py",line 1712,line 1534,in _Uploadobject
    global_params=global_params)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/third_party/storage_apitools/storage_v1_client.py",line 1182,in Insert
    upload=upload,upload_config=upload_config)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py",line 703,in _RunMethod
    download)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py",line 679,in PrepareHttpRequest
    upload.ConfigureRequest(upload_config,HTTP_Request,url_builder)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py",line 763,in ConfigureRequest
    self.__ConfigureMultipartRequest(HTTP_Request)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py",line 823,in __ConfigureMultipartRequest
    g.flatten(msg_root,unixfrom=False)
  File "/usr/lib/python3.6/email/generator.py",line 116,in flatten
    self._write(msg)
  File "/usr/lib/python3.6/email/generator.py",line 181,in _write
    self._dispatch(msg)
  File "/usr/lib/python3.6/email/generator.py",line 214,in _dispatch
    meth(msg)
  File "/usr/lib/python3.6/email/generator.py",line 272,in _handle_multipart
    g.flatten(part,unixfrom=False,linesep=self._NL)
  File "/usr/lib/python3.6/email/generator.py",line 361,in _handle_message
    payload = self._encode(payload)
  File "/usr/lib/python3.6/email/generator.py",line 412,in _encode
    return s.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\udca8' in position 22881: ordinal not in range(128)

Linux 系统为 Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-76-generic x86_64)。

解决方法

我把你的字符串换成了中文字符,并且能够重现你的错误。我在更新到 gsutil 4.62 后修复了它。这是 merged PRissue tracker 作为参考。

通过运行更新 Cloud SDK:

gcloud components update

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。