微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Numpy/Pandas 关联多个不同长度的数组

如何解决Numpy/Pandas 关联多个不同长度的数组

我可以使用 this method 关联两个不同长度的数组:

import pandas as pd
import numpy as np
from scipy.stats.stats import pearsonr

a = [0,0.4,0.2,0.5]
b = [25,40,62,58,53,54]
df = pd.DataFrame(dict(x=a))

CORR_VALS = np.array(b)
def get_correlation(vals):
    return pearsonr(vals,CORR_VALS)[0]

df['correlation'] = df.rolling(window=len(CORR_VALS)).apply(get_correlation)

我得到这样的结果:

In [1]: df
Out[1]: 

    x  correlation
0  0.0          NaN
1  0.4          NaN
2  0.2          NaN
3  0.4          NaN
4  0.2          NaN
5  0.4     0.527932
6  0.2    -0.159167
7  0.5     0.189482

首先,皮尔逊系数应该是这个数据集中的最高数字...

其次,我如何为多组数据执行此操作?我想要一个像我在 df.corr() 中得到的输出。适当标记索引和列。

例如,假设我有以下数据集:

a = [0,54]
c = [ 0,0.45,0.52,0.21,0.51]
d = [ 0.4,0.5]

我想要一个包含 16 个皮尔逊系数的相关矩阵...

解决方法

import pandas as pd
import numpy as np
from scipy.stats.stats import pearsonr

a = [0,0.4,0.2,0.5]
b = [25,40,62,58,53,54]
c = [ 0,0.45,0.52,0.21,0.51]
d = [ 0.4,0.5]

# To store the data
dict_series = {'a': a,'b': b,'c':c,'d':d}
list_series_names = [i for i in dict_series.keys()]

def get_max_correlation_from_lists(a,b):
    # This is to make sure the longest list is in the dataframe
    if len(b)>=len(a):
        a_old = a
        a = b
        b= a_old
    # Taking the body from the original code.
    df = pd.DataFrame(dict(x=a))
    CORR_VALS = np.array(b)
    def get_correlation(vals):
        return pearsonr(vals,CORR_VALS)[0]
    # Collecting the max
    return df.rolling(window=len(CORR_VALS)).apply(get_correlation).max().values[0]

# This is to create the "correlations" matrix
correlations_matrix = pd.DataFrame(index=list_series_names,columns=list_series_names )
for i in list_series_names:
    for j in list_series_names:
        correlations_matrix.loc[i,j]=get_max_correlation_from_lists(dict_series[i],dict_series[j])

print(correlations_matrix)
          a         b         c         d
a       1.0  0.527932  0.995791       1.0
b  0.527932       1.0   0.52229  0.427992
c  0.995791   0.52229       1.0  0.992336
d       1.0  0.427992  0.992336       1.0

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。