情节注释彼此太接近不可读

如何解决情节注释彼此太接近不可读

我有以下代码可以为 PCA 后的载荷创建一个图：

# Creating pipeline objects 
## PCA
pca = PCA(n_components=2)
## Create columntransformer to only scale a selected set of featues
categorical_ix = X.select_dtypes(exclude=np.number).columns

features = X.columns

ct = ColumnTransformer([
        ('encoder',OneHotEncoder(),categorical_ix),('scaler',StandardScaler(),['tenure','MonthlyCharges','TotalCharges'])
    ],remainder='passthrough')

# Create pipeline
pca_pipe = make_pipeline(ct,pca)

# Fit data to pipeline
pca_result = pca_pipe.fit_transform(X)

loadings = pca.components_.T * np.sqrt(pca.explained_variance_)

fig = px.scatter(pca_result,x=0,y=1,color=customer_data_raw['churn'])

for i,feature in enumerate(features):
    fig.add_shape(
        type='line',x0=0,y0=0,x1=loadings[i,0],y1=loadings[i,1]
    )
    fig.add_annotation(
        x=loadings[i,y=loadings[i,1],ax=0,ay=0,xanchor="center",yanchor="bottom",text=feature,)
fig.show()

产生以下输出：

如何使装载的标签具有可读性？

编辑： X 中有 19 个特征。

    gender  SeniorCitizen   Partner Dependents  tenure  Phoneservice    MultipleLines   InternetService Onlinesecurity  OnlineBackup    DeviceProtection    TechSupport StreamingTV StreamingMovies Contract    PaperlessBilling    PaymentMethod   MonthlyCharges  TotalCharges
customerID                                                                          
7590-VHVEG  Female  0   Yes No  1   No  No phone service    DSL No  Yes No  No  No  No  Month-to-month  Yes Electronic check    29.85   29.85
5575-GNVDE  Male    0   No  No  34  Yes No  DSL Yes No  Yes No  No  No  One year    No  Mailed check    56.95   1889.50
3668-QPYBK  Male    0   No  No  2   Yes No  DSL Yes Yes No  No  No  No  Month-to-month  Yes Mailed check    53.85   108.15
7795-CFOCW  Male    0   No  No  45  No  No phone service    DSL Yes No  Yes Yes No  No  One year    No  Bank transfer (automatic)   42.30   1840.75
9237-HQITU  Female  0   No  No  2   Yes No  Fiber optic No  No  No  No  No  No  Month-to-month  Yes Electronic check    70.70   151.65

解决方法

根据您的 DataFrame，您有 19 个特征，并且您将它们全部添加到该位置作为您的线，因为 ax 和 ay 都设置为 0。

我们可以在您循环遍历特征以进行旋转时更改 ax 和 ay，这有望使您的注释更易于区分。这是基于使用 x = r*cos(theta) 和 y = r*sin(theta) 从极坐标转换为笛卡尔坐标，其中 theta 通过值 0*360/19,1*360/19,...,18*360/19。我们希望将 x 和 y 参考设置为 x 和 y 坐标而不是纸坐标，然后设置 r=2 或与您的绘图相当的某个值（这将使注释线长度最长为 2）

from math import sin,cos,pi
r = 2 # this can be modified as needed,and is in units of the axis
theta = 2*pi/len(features)

for i,feature in enumerate(features):
    fig.add_shape(
        type='line',x0=0,y0=0,x1=loadings[i,0],y1=loadings[i,1]
    )
    fig.add_annotation(
        x=loadings[i,y=loadings[i,1],ax=r*sin(i*theta),ay=r*cos(i*theta),axref="x",ayref="y",xanchor="center",yanchor="bottom",text=feature,)