从全局字典中提取数据框长脚本

如何解决从全局字典中提取数据框长脚本

警告：长帖。我已尽力将其分解成易于消化的内容，以便上帝保佑有人可以帮助我。

如此酷的堆栈猫，看看吧，我正在构建一个有效的投资组合分析前沿工具。通过直接从 excel 文件 (df = pd.read_csv(foo.csv)) 调用数据，我使该工具工作得很好。但是，现在我有数百个 excel 文件，可能很快就会有数千个，我想将它们一次加载到字典中，这样经过数十万次迭代将节省实时时间。

我编写了一个快速函数（如下）将数据加载到全局字典中。这本字典似乎工作得很好。

equityLibrary = {}

def listFiles(path):
    if os.path.isfile(path):
        return[path]
    else:
        files = []
        for filename in os.listdir(path):
            files += listFiles(path + '/' + filename)
        return files

def loadData():
    ##This function jsut lists the files in my data dir
    library = listFiles('./data')
    
    for stock in library:
        startKeyName = stock[7:]
        keyName = startKeyName[:-4]
        variableName = keyName.lower()
        df = readData(keyName)
        equityLibrary[variableName] = df

好的，很酷。数据帧被加载到一个全局字典中。现在，让我们获取必要的投资组合数据，我正在使用以下函数编译股票投资组合的波动率和回报：

def PortfolioVolAndReturn(holdings,weights):
    table = pd.DataFrame()
    weights = np.array(weights)

    for i in holdings:
        df = equityLibrary[i.lower()]
        table[i] = df.close

    df.index = pd.to_datetime(df.index)

    ##Convert to log return eventually
    dailyReturns = table.pct_change()
    annualReturn = dailyReturns.mean() * 252

    dailyCOVAR = dailyReturns.cov()
    annualCOVAR = dailyCOVAR * 252

    returnOut = np.dot(weights,annualReturn)
    volOut = np.sqrt(np.dot(weights.T,np.dot(annualCOVAR,weights)))
    
    return returnOut,volOut


##Return Sharp ratio
def sharpeRatio(vol,returns):
    return returns/vol

那行得通！！ ……但只有一次。这是核心问题：当我在蒙特卡罗模拟中运行它时，我返回的数据帧有一个正确的行，而所有其他行都返回所有列的 NaN 值。我正在使用这个函数来运行模拟：

##takes in number of simulations for 
def runSim(holdings,simulations):
    weightsArray = []
    returnsArray = []
    volArray = []

    df = pd.DataFrame()

    for i in range(simulations):
        weights = np.random.rand(len(holdings))
        weights /= np.sum(weights)
        returns,vol = PortfolioVolAndReturn(holdings,weights)

        weightsArray.append(weights)
        returnsArray.append(returns)
        volArray.append(vol)
    
    df['weights'] = weightsArray
    df['return'] = returnsArray
    df['vol'] = volArray

    return df

我的退货坏了：

                                              weights    return      vol
0   [0.047066195990327415,0.27999255422401625,0....  0.269764  0.35462
1   [0.6172192455260814,0.08842975803644981,0.29...       NaN      NaN
2   [0.3610819481882519,0.5059521988331586,0.132...       NaN      NaN
3   [0.01626716037860542,0.16093554050483386,0.8...       NaN      NaN

我知道问题出在全局字典上。如果我直接从 csv 文件调用数据帧，它很好，但速度非常慢。我终其一生都无法弄清楚为什么我的字典在第二次通话后就会胡说八道。我做错了什么，为什么我不够聪明，无法弄清楚？？

更新：问题出在函数 PortfolioVolAndReturn 中。无论持仓争论多长或多空，最后一列都会作为 NaN 附加到数据框。示例：

       CSCO   INTC  CAT
0     26.84  34.23  NaN
1     26.52  34.47  NaN
2     26.15  34.19  NaN

和

      CSCO   INTC     CAT  JPM
0     26.84  34.23   69.24  NaN
1     26.52  34.47   69.22  NaN
2     26.15  34.19   68.25  NaN

这对我来说完全没有意义。运行代码我不知道是什么导致了这种情况。我没有正确地遍历字典吗？我觉得一切都是合乎逻辑的，不应错误地拉取数据帧。 O_o

解决方法

好吧，在休息并回来后，我使用另一个函数调用使一切运行得非常顺利：

def closingPriceTable(holdings):
    table = pd.DataFrame()
    for i in holdings:
        df = equityLibrary[i.lower()]
        table[i] = df.close
    
    return table

仍然不知道为什么我的原始调用不起作用。真的不介意有人看着让我知道我做了什么愚蠢的事情。