微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何正确并行化类方法?

如何解决如何正确并行化类方法?

我有一个表示自定义对象的类,它是从 FactSet 中提取的报告:

import pandas as pd
import pyodbc
import sys
import multiprocessing as mp

class FactSetReshapedobject(object):
    """
    This class represent abstract FactSet report file
    """
    def __init__(self,file_path):
        """
        Initializes object with given file_path

        Parameters
        ----------
        file_path : string
            path to excel file.

        Returns
        -------
        None.

        """
        self.file_path = file_path
        self.data_frame = None
        # target data_frame with new column names
        self.target_data_frame = pd.DataFrame(columns=['Date','Ticker','Company','Ending Price','Port. Weight'])
        
    def file_reader(self,skip_n_rows,skip_n_footer):
        """
        Reads excel file from path

        Parameters
        ----------
        skip_n_rows : integer
            rows to skip at the beginning of the file.
        skip_n_footer : integer
            rows to skip at the end of the file.

        Returns
        -------
        None.

        """
        file_reader = pd.read_excel(self.file_path,skiprows=skip_n_rows,skipfooter=skip_n_footer)
        self.data_frame = pd.DataFrame(file_reader)
        
    def replace_unnamed_columns(self):
        """
        Loops through self.data_frame and replaces 1st column noted as 'Unnamed: 0' into 'Ticker' and 'Unnamed 1' into 'Company'. 
        Remaining are replaced into date columns
        
        Returns
        -------
        None.
        """
        for i,k in enumerate(self.data_frame.columns):
            if k == 'Unnamed: 0':
                self.data_frame.columns.values[i] = 'Ticker'
                i += 1
            elif k == 'Unnamed: 1':
                self.data_frame.columns.values[i] = 'Company'
                i += 1
            elif 'Unnamed' in k:
                self.data_frame.columns.values[i] = self.data_frame.columns.values[i - 1]
                i += 1
    
    def append_values_into_target_data_frame(self):
        """
        Appends values into target_data_frame,which is new data frame created on the basis of original data

        Returns
        -------
        None.

        """
        # iterate through the old data frame,starting with the 2nd row; append values under each column
        for index,row in self.data_frame[1:].iterrows():
            for i in range(2,len(self.data_frame.columns),2):
                self.target_data_frame = self.target_data_frame.append([{'Date': self.data_frame.columns[i],'Ticker': row[0],'Company': row[1],'Ending Price': row[i],'Port. Weight': row[i + 1]}])

本来,拉入数据框时是这样的:

enter image description here

它非常丑陋,所以我构建了几种方法来重塑它,并允许我进一步研究它并获得所需的结构:

enter image description here

然而,主要方法 append_values_into_target_data_frame 在较大的卷上效率不高,因为它逐行迭代并将值附加到所需的结构中。我不知道如何有效地做到这一点,并且正在考虑在多个内核上运行。但这些对我来说完全是未知的水域,不知道在这种情况下如何做到这一点。正在尝试执行以下操作,但在类实例调用时返回 TypeError: 'module' object is not callable

def data_shaper(self):
    pararellism = 3
    # self.mapper = mp.Pool()
    self.p=mp.pool(pararellism)
    self.p.map(self.append_values_into_target_data_frame) 

如果有任何提示,我将不胜感激。也许有另一种不同的方法可以达到我不知道并且应该考虑的相同结果?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。