估算Python中数据序列的相关时间

如何解决估算Python中数据序列的相关时间

我一直在尝试估计数据框的相关时间。我要编写的自相关函数就是这样的：

其中B（t）是数据，<...> T代表大小为T的样本的样本平均，m和σ^ 2分别为样本均值和样本方差。然后可以使用电子折叠技术，通过考虑ACF下降到最大值的1 / e所需的时间来估计相关时间。

我正在使用的代码是这样的：

# Our data
Btotal=np.random.randint(100,size=777600)

# Our Sample size
sample=24*60*60
num_of_times=int(len(Btotal)/sample)

# I am not sure if I should have this
tau1=4*60*60

# To number of lags we choose
steps=100

## Define correlation time matrix,AC function matrix 
t_col=np.zeros(num_of_times)

acf1=np.zeros((num_of_times,int(steps)))
for l in range(0,num_of_times,1):

    # Divide the data into samples os size 24 H
    Btotal1=Btotal[l*sample:(l+1)*sample]
    
    
    # The maximum lag we choose
    max_lag=int(tau1/5)
     
   # To estimate at different lags

    lag1=np.linspace(1,(max_lag),steps)
   
    for k in range(steps):
        lag=int(lag1[k])
        
       #The mean and sigma
        mu=np.zeros(int(len(Btotal1)/tau1))
        sigma_squqared=np.zeros(int(len(Btotal1)/tau1))
        num=np.zeros((int(len(Btotal1)/tau1),tau1))
        acf=np.zeros((int(len(Btotal1)/tau1)))
        
       # Estimate the ACF
        for i in range (0,int(len(Btotal1)/tau1)-1,1): ## edw -1
            mu[i]=np.mean(Btotal1[i*tau1:(i+1)*tau1])
            sigma_squqared[i]=np.var(Btotal1[i*tau1:(i+1)*tau1])
            for j in range(0,tau1,1):
                num[i,j]=(Btotal1[j+i*tau1]-mu[i])*(Btotal1[j+i*tau1+lag]-mu[i])
            acf[i]=np.mean(num[i,:])/sigma_squqared[i]

            ##SHOULD I BE DOING THIS STEP??#####
            ## it is the step that bothers me
            # What it does essentially is find the mean value of acf 
            # for a given time lag. So the final plot shows (each of  the lines in the plot) shows hot the ACF function evolves as a function of the time lag. And for each time lag the value of the y axis represents the mean value estimated below.
         acf1[l,k]=np.mean(acf)

          ### Finds the corel time for each of the samples
    t_col[l]=lag1[np.argmin(np.abs(acf1[l,:]-1/np.exp(1)))]





 #####Just to plot,not relevant####
plt.plot(t_col[t_col>0])



# import datetime
start_date = datetime.date(2020,9,21)
end_date   = datetime.date(2018,29)

dates= [ start_date + datetime.timedelta(n) for n in range(int ((end_date - start_date).days))]


#plt.figure(figsize=(3.54,3.54),dpi=600)
for m in range (0,1):
    plt.plot(lag1,acf1[m],'--')#,label=dates[m],linewidth=1)
    plt.plot([0,max(lag1)],[1/np.exp(1),1/np.exp(1)],linewidth=0.2)
plt.plot([np.mean(t_col),np.mean(t_col)],[0,0.001],color='white',label=" $<t_{{cor}}>$ = {} sec".format(int(np.mean(t_col))),linewidth=0.2)
plt.legend(loc=0,fontsize=14)
plt.axis([0,max(lag1),1])
plt.xlabel('Time-lag (τ) [s]',fontsize=14)
plt.ylabel('Auto correlation function (ACF)',fontsize=14)
plt.savefig('acf.eps',format='eps')
plt.show()

我得到的结果就是这个：

我有一种感觉，尽管它不正确。最让我困扰的是我在代码中提到的步骤。我应该这样做吗？有人可以建议对代码进行任何改进吗？

估算Python中数据序列的相关时间

如何解决估算Python中数据序列的相关时间

相关推荐