如何解决从延迟的 dask 数组创建 dask 数据帧
我有一个存储在 dask_arr_ls
中的延迟 dask 数组列表,我想将其转换为 dask 数据帧。这是我的管道的骨架:
def simulate_device_data(num_id):
# create data for unknown number of timestamps
data_ls = [unknown_qty*[num_id,time,lon,lat]]
device_arr = np.stack(data_ls)
device_dask_arr = da.from_array(device_arr,chunks=device_arr.size)
return device_dask_arr
dask_arr_ls = []
for i_device in range(n_devices):
i_dask_arr = delayed(simulate_device_data)(i_device)
dask_arr_ls.append(i_darr)
dask_arr_ls = [da.from_delayed(i_dask_arr,shape=(np.nan,4),dtype=float)
for i_dask_arr in dask_arr_ls]
ddf = dd.concat([dd.from_dask_array(i_darr) for i_darr in darr_ls])
ddf.columns = ["num_id","t","lon","lat"]
ddf.compute()
compute()
产生以下错误消息:
ValueError: DataFrame constructor not properly called!
我做错了什么?
解决方法
我从来没有弄清楚我的错误是什么上面的代码。我怀疑我以某种方式滥用了 delayed
。我按如下方式修改了我的管道以使其正常工作。
def simulate_device_data(num_id):
# create data for unknown number of timestamps
data_ls = [unknown_qty*[num_id,time,lon,lat]]
device_arr = np.stack(data_ls)
device_df = pd.DataFrame(device_arr)
return device_df
df_ls = []
for i_device in range(n_devices):
i_df = delayed(simulate_device_data)(i_device)
df_ls.append(i_df)
archetype_df = pd.DataFrame(None,columns=["name","num_id","t","lon","lat"])
archetype_df = archetype_df.astype({"name": "object","num_id": "int64","t": "datetime64[ns]","lon": "float64","lat": "float64"},copy=False)
ddf = dd.from_delayed(df_ls,meta=archetype_df)
ddf.compute()
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。