微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

在 pandas.read_parquet 中使用 AWS_PROFILE

如何解决在 pandas.read_parquet 中使用 AWS_PROFILE

我正在本地测试这个,我有一个 ~/.aws/config 文件

~/.aws/config 看起来像:

[profile a] 
...
[profile b]
...

我还有一个 AWS_PROFILE 环境变量设置为“a”。

我想使用 Pandas 读取一个可以通过配置文件 b 访问的文件

我可以通过以下方式通过 s3fs 访问它:

import s3fs
fs = s3fs.S3FileSystem(profile="b")
fs.get("BUCKET/FILE.parquet","FILE.parquet")
pd.read_parquet("FILE.parquet")

但是,如果我尝试使用 storage_options 将其传递给 pd.read_parquet,则会得到 PermissionError: Forbidden

pd.read_parquet(
    "s3://BUCKET/FILE.parquet",storage_options={"profile": "b"},)

完整回溯如下

Traceback (most recent call last):
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/s3fs/core.py",line 233,in _call_s3
    out = await method(**additional_kwargs)
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/aiobotocore/client.py",line 154,in _make_api_call
    raise error_class(parsed_response,operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the Headobject operation: Forbidden

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>",line 1,in <module>
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/pandas/io/parquet.py",line 459,in read_parquet
    return impl.read(
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/pandas/io/parquet.py",line 221,in read
    return self.api.parquet.read_table(
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/pyarrow/parquet.py",line 1672,in read_table
    dataset = _ParquetDatasetV2(
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/pyarrow/parquet.py",line 1504,in __init__
    if filesystem.get_file_info(path_or_paths).is_file:
  File "pyarrow/_fs.pyx",line 438,in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow/error.pxi",line 122,in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/_fs.pyx",line 1004,in pyarrow._fs._cb_get_file_info
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/pyarrow/fs.py",line 226,in get_file_info
    info = self.fs.info(path)
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/fsspec/asyn.py",line 72,in wrapper
    return sync(self.loop,func,*args,**kwargs)
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/fsspec/asyn.py",line 53,in sync
    raise result[0]
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/fsspec/asyn.py",line 20,in _runner
    result[0] = await coro
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/s3fs/core.py",line 911,in _info
    out = await self._call_s3(
  File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/s3fs/core.py",line 252,in _call_s3
    raise translate_boto_error(err)
PermissionError: Forbidden

注意:有一个与此有些相关的老问题,但没有帮助:How to read parquet file from s3 using dask with specific AWS profile

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?