如何解决将“.eml”文件传输到 Google Cloud Platform 时出现 UnicodeEncodeErrorLinux 上的 gsutil v4.6.1
在使用 gsutil cp
命令将文件从 Linux 系统传输到 Google Cloud Platform 时,它在尝试处理其内容(不仅仅是文件名!)时在一些旧的“.eml”文件中失败包含未以 Unicode 编码的非英文字符。
尝试的命令是:
gsutil cp "/home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml" gs://darsen_backup_monthly/
错误信息是:
UnicodeEncodeError: 'ascii' codec can't encode character '\udca8' in position 22881: ordinal not in range(128)
gsutil rsync
给出了一个非常相似的错误。位置 22881 (0x5961) 位于多部分电子邮件源文件的末尾。以下显示十六进制转储的文件内容:
00005960: 20a8 43a4 d1b3 a320 5961 686f 6f21 a95f .C.... Yahoo!._
00005970: bcaf 203e 2020 7777 772e 7961 686f 6f2e .. > www.yahoo.
00005980: 636f 6d2e 7477 0d0a com.tw..
我们在位置 0x5961 看到字节“0xa8”,这是错误消息指出的问题根源。出于某种原因,gsutil
试图对文本进行编码。在支持汉字的终端打开文件时,我们看到:
< 每天都 Yahoo!奇摩 > www.yahoo.com.tw
Big-5 编码的第一个汉字“每”是 0xa843。一个简单的解决方法是将文件扩展名重命名为“.eml”以外的其他名称,例如“.eml.bak”,以便 gsutil
不处理文件内容。遗憾的是,在进行批量传输时,很难提前知道此类非英文字符文件的存在,并且整个过程可能会多次停止。
以下是完整的错误信息:
darsenlu@devmodel:~/Home$ gsutil cp "/home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml" gs://darsen_backup_monthly/
copying file:///home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml [Content-Type=message/rfc822]...
Traceback (most recent call last):
File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil",line 21,in <module>
gsutil.RunMain()
File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil.py",line 122,in RunMain
sys.exit(gslib.__main__.main())
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py",line 444,in main
user_project=user_project)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py",line 780,in _RunNamedCommandAndHandleExceptions
_HandleUnkNownFailure(e)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py",line 639,in _RunNamedCommandAndHandleExceptions
user_project=user_project)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command_runner.py",line 411,in RunNamedCommand
return_code = command_inst.runcommand()
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py",line 1124,in runcommand
seek_ahead_iterator=seek_ahead_iterator)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py",line 1525,in Apply
arg_checker,should_return_results,fail_on_error)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py",line 1596,in _SequentialApply
worker_thread.PerformTask(task,self)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py",line 2316,in PerformTask
results = task.func(cls,task.args,thread_state=self.thread_gsutil_api)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py",line 709,in _copyFuncWrapper
preserve_posix=cls.preserve_posix_attrs)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py",line 924,in copyFunc
preserve_posix=preserve_posix)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py",line 3957,in Performcopy
gzip_encoded=gzip_encoded)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py",line 2250,in _UploadFiletoObject
parallel_composite_upload,logger)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py",line 2066,in _DelegateUploadFiletoObject
elapsed_time,uploaded_object = upload_delegate()
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py",line 2227,in CallNonResumableupload
gzip_encoded=gzip_encoded_file)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py",line 1762,in _UploadFiletoObjectNonResumable
gzip_encoded=gzip_encoded)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py",line 388,in Uploadobject
gzip_encoded=gzip_encoded)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py",line 1712,line 1534,in _Uploadobject
global_params=global_params)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/third_party/storage_apitools/storage_v1_client.py",line 1182,in Insert
upload=upload,upload_config=upload_config)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py",line 703,in _RunMethod
download)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py",line 679,in PrepareHttpRequest
upload.ConfigureRequest(upload_config,HTTP_Request,url_builder)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py",line 763,in ConfigureRequest
self.__ConfigureMultipartRequest(HTTP_Request)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py",line 823,in __ConfigureMultipartRequest
g.flatten(msg_root,unixfrom=False)
File "/usr/lib/python3.6/email/generator.py",line 116,in flatten
self._write(msg)
File "/usr/lib/python3.6/email/generator.py",line 181,in _write
self._dispatch(msg)
File "/usr/lib/python3.6/email/generator.py",line 214,in _dispatch
meth(msg)
File "/usr/lib/python3.6/email/generator.py",line 272,in _handle_multipart
g.flatten(part,unixfrom=False,linesep=self._NL)
File "/usr/lib/python3.6/email/generator.py",line 361,in _handle_message
payload = self._encode(payload)
File "/usr/lib/python3.6/email/generator.py",line 412,in _encode
return s.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\udca8' in position 22881: ordinal not in range(128)
Linux 系统为 Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-76-generic x86_64)。
解决方法
我把你的字符串换成了中文字符,并且能够重现你的错误。我在更新到 gsutil 4.62
后修复了它。这是 merged PR 和 issue tracker 作为参考。
通过运行更新 Cloud SDK:
gcloud components update
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。