如何解决情节注释彼此太接近不可读
# Creating pipeline objects
## PCA
pca = PCA(n_components=2)
## Create columntransformer to only scale a selected set of featues
categorical_ix = X.select_dtypes(exclude=np.number).columns
features = X.columns
ct = ColumnTransformer([
('encoder',OneHotEncoder(),categorical_ix),('scaler',StandardScaler(),['tenure','MonthlyCharges','TotalCharges'])
],remainder='passthrough')
# Create pipeline
pca_pipe = make_pipeline(ct,pca)
# Fit data to pipeline
pca_result = pca_pipe.fit_transform(X)
loadings = pca.components_.T * np.sqrt(pca.explained_variance_)
fig = px.scatter(pca_result,x=0,y=1,color=customer_data_raw['churn'])
for i,feature in enumerate(features):
fig.add_shape(
type='line',x0=0,y0=0,x1=loadings[i,0],y1=loadings[i,1]
)
fig.add_annotation(
x=loadings[i,y=loadings[i,1],ax=0,ay=0,xanchor="center",yanchor="bottom",text=feature,)
fig.show()
产生以下输出:
如何使装载的标签具有可读性?
编辑: X 中有 19 个特征。
gender SeniorCitizen Partner Dependents tenure Phoneservice MultipleLines InternetService Onlinesecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
customerID
7590-VHVEG Female 0 Yes No 1 No No phone service DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85
5575-GNVDE Male 0 No No 34 Yes No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.50
3668-QPYBK Male 0 No No 2 Yes No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15
7795-CFOCW Male 0 No No 45 No No phone service DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75
9237-HQITU Female 0 No No 2 Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65
解决方法
根据您的 DataFrame,您有 19 个特征,并且您将它们全部添加到该位置作为您的线,因为 ax 和 ay 都设置为 0。
我们可以在您循环遍历特征以进行旋转时更改 ax
和 ay
,这有望使您的注释更易于区分。这是基于使用 x = r*cos(theta)
和 y = r*sin(theta)
从极坐标转换为笛卡尔坐标,其中 theta 通过值 0*360/19,1*360/19,...,18*360/19
。我们希望将 x 和 y 参考设置为 x 和 y 坐标而不是纸坐标,然后设置 r=2 或与您的绘图相当的某个值(这将使注释线长度最长为 2)
from math import sin,cos,pi
r = 2 # this can be modified as needed,and is in units of the axis
theta = 2*pi/len(features)
for i,feature in enumerate(features):
fig.add_shape(
type='line',x0=0,y0=0,x1=loadings[i,0],y1=loadings[i,1]
)
fig.add_annotation(
x=loadings[i,y=loadings[i,1],ax=r*sin(i*theta),ay=r*cos(i*theta),axref="x",ayref="y",xanchor="center",yanchor="bottom",text=feature,)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。