python程序并行处理出错

如何解决python程序并行处理出错

def bagging_and_trees_growth(samples,network,tree_num):
    trees = []
    
    for i in range(tree_num):
        bootstrap_samples = bagging(samples)
        a_tree = tree_growth(network,bootstrap_samples)
        trees.append(a_tree)
        
    return trees
    
def agiled_random_forest(samples,size,processes=39):
   
    rforest = []
        
    #job_server = pp.Server(processes=processes)
    threadPool = ThreadPool(processes=processes)
     
    depfun = (find_best_split,stopping_condition,purity_gain,Gini_index,find_neighbors,tree_growth,bagging)
    dep_modules = ('networkx','numpy','math','random','sys','pNGF')
    
    
    tree_num_of_each_task = int(size/processes)
    
    #jobs = [pp_server.submit(bagging_and_trees_growth,(samples,tree_num_of_each_task),depfun,'dep_modules) for x in range(processes)]
   
    jobs = [threadPool.apply_async(bagging_and_trees_growth,dep_modules) for x in range(processes)]
    
    
    for job in jobs:
        rforest += job.get()
    
    threadPool.destroy()
    return rforest

显示映射和元组错误

TypeError: bagging_and_trees_growth() argument after ** must be a mapping,not tuple

如何解决此错误，因为 pp moules 在 python3 中不起作用？

解决方法

您可能正在寻找这样的东西。

这里的想法是 bagging_and_trees_growth 不再有内部作业循环；我们依靠线程池（或者，最好是 GIL 考虑的进程池，但这取决于您）来有效地处理工作。

因为在此处执行作业的顺序显然没有区别，imap_unordered 将是最快的高级构造。也可以使用 apply_async，但工作量更大。

import itertools
import multiprocessing.pool


def bagging_and_trees_growth(job):
    samples,network = job  # unpack the job tuple
    bootstrap_samples = bagging(samples)
    a_tree = tree_growth(network,bootstrap_samples)
    return a_tree


def agiled_random_forest(samples,network,size,processes=39):
    rforest = []
    with multiprocessing.pool.ThreadPool(processes=processes) as pool:
        # to use imap_unordered (the fastest high-level pool operation),# we need to pack each job into an object; since all we need here is 2 parameters,let's use a tuple.
        # set up a generator to generate the same job size times
        job_gen = itertools.repeat((samples,network),size)
        # do the work in parallel
        for result in pool.imap_unordered(bagging_and_trees_growth,job_gen):
            # could do something else with the result here;
            # in fact this could all just be `rforest = list(pool.imap...)`
            # in the simple case
            rforest.append(result)
    return rforest

python程序并行处理出错

如何解决python程序并行处理出错

解决方法

相关推荐