如何解决Java S3 上传大文件 (~ 1.5Tb) 时出现 ResetException 错误通过 InputStream
我有在 Java 中运行的应用程序。我有一个大文件,我将其加密并上传到 S3。由于文件很大,我无法将其保存在内存中,因此使用 PipedInput 和 PipedOutputStreams 进行加密。 我有 BufferedInputStream 包装 PipedInputStream 然后传递给 S3 PutObjectRequest。我已经计算了加密对象的大小并将其添加到 ObjectMetadata 中。 下面是一些代码片段:
PipedInputStream pis = new PipedInputStream(uploadFileInfo.getPout(),MAX_BUFFER_SIZE);
BufferedInputStream bis = new BufferedInputStream(pis,MAX_BUFFER_SIZE);
LOG.info("Is mark supported? " + bis.markSupported());
PutObjectRequest putObjectRequest = new PutObjectRequest(uploadFileInfo.getS3TargetBucket(),uploadFileInfo.getS3TargetobjectKey() + ".encrypted",bis,Metadata);
//Set read limit to more than stream size expected i.e 20mb
// https://github.com/aws/aws-sdk-java/issues/427
LOG.info("set read limit to " + (MAX_BUFFER_SIZE + 1));
putObjectRequest.getRequestClientOptions().setReadLimit(MAX_BUFFER_SIZE + 1);
Upload upload = transferManager.upload(putObjectRequest);
我的堆栈跟踪显示对 BufferedInputStream 的 reset()
调用正在引发异常
[UPLOADER_TRACKER] ERROR com.xxx.yyy.zzz.handler.TrackProgressHandler - Exception from S3 transfer
com.amazonaws.ResetException: The request to the service Failed with a retryable reason,but resetting the request input stream has Failed. See exception.getExtraInfo or debug-level logging for the original failure that caused this retry.; If the request involves an input stream,the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1423)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1240)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3734)
at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3719)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:258)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:189)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:121)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:143)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:48)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Resetting to invalid mark
at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1421)
... 22 more
[UPLOADER_TRACKER] ERROR com.xxx.yyy.zzz.handler.TrackProgressHandler - Reset exception caught ==> If the request involves an input stream,the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
com.amazonaws.ResetException: The request to the service Failed with a retryable reason,the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1423)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1240)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3734)
at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3719)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:258)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:189)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:121)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:143)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:48)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Resetting to invalid mark
at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1421)
不过,我将 readLimit 添加到 MAX_BUFFER_SIZE + 1。这是来自 AWS 的可靠性提示。 有人之前遇到过这个问题吗?侧面说明:由于对文件进行了加密,因此与 File 或 FileInputStream 相比,我需要使用 inputstream。我也无权在本地写入磁盘。
解决方法
我认为您误解了建议。引用 link you provided 并强调:
例如,如果流的最大预期大小为 100,000 字节,则将读取限制设置为 100,001 (100,000 + 1) 字节。标记和重置将始终适用于 100,000 字节或更少。请注意,这可能会导致某些流将该数量的字节缓冲到内存中。
正如我所解释的那样,它配置客户端能够从源流本地缓冲内容,当该流不支持自身标记/重置时。这与 {{ 3}}:
用于为不可标记和重置的非文件输入流启用标记和重置
换句话说,它用于在客户端缓冲整个源流,而不是指定从源流发送的数据块有多大。在您的情况下,我认为它被忽略了,因为 (1) 您没有缓冲整个流,以及 (2) 您传递的流确实自己实现了标记/重置。 >
多部分上传,这就是 TransferManager
在您的示例中所做的,将输入流分成至少 5 MB 的块(实际块大小取决于流的声明大小;对于1.5 TB 文件,大约 158 MiB)。这些是使用 UploadPart
API 调用上传的,该调用尝试一次发送整个块。如果某个部分由于可重试的原因而失败,则客户端尝试将流重置为块的开头。
您可能可以通过将 BufferedInputStream
的读取限制设置为足够大以容纳单个零件的大小来实现此目的。传输管理器使用的计算是the documentation for RequestClientOptions.DEFAULT_STREAM_BUFFER_SIZE
;它是文件大小除以 10,000(分段上传中的最大分段数)。所以,还是 158 MiB。为了安全起见,我会使用 200 MiB(因为我确定您有更大的文件)。
如果是我,我可能会直接使用低级的分段上传方法。在我看来,TransferManger
的主要好处是能够上传一个文件,它可以利用多个线程来执行并发部分上传。对于流,您必须按顺序处理每个部分。
实际上,如果是我,我会认真地重新考虑上传一个 1.5 TB 的文件。是的,你可以做到。但是我无法想象您每次想要阅读它时都会下载整个文件。相反,我希望您下载一个字节范围。在这种情况下,您可能会发现处理 1500 个文件(每个文件大小为 1 GiB)同样容易。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。