需要从python中提取或删除列

如何解决需要从python中提取或删除列

我有一个看起来像这样的列表：

    categorical_features = \
    ['FireplaceQu','BsmtQual','BsmtCond','GarageQual','GarageCond','ExterQual','ExterCond','HeatingQC','PoolQC','KitchenQual','BsmtFinType1','BsmtFinType2','Functional','Fence','BsmtExposure','GarageFinish','LandSlope','LotShape','PavedDrive','Street','Alley','CentralAir','MSSubClass','OverallQual','OverallCond','YrSold','MoSold']

我需要通过执行以下操作从数据集中删除这些列：

all_data = all_data.loc[:,categorical_features]

不幸的是，此步骤仅选择这些列。我将如何通过排除它们来逆转该过程？

解决方法

我建议您计算出想要的一个，这样会更容易

categorical_features = \
    ['FireplaceQu','BsmtQual','BsmtCond','GarageQual','GarageCond','ExterQual','ExterCond','HeatingQC','PoolQC','KitchenQual','BsmtFinType1','BsmtFinType2','Functional','Fence','BsmtExposure','GarageFinish','LandSlope','LotShape','PavedDrive','Street','Alley','CentralAir','MSSubClass','OverallQual','OverallCond','YrSold','MoSold']

cols = set(df.columns).difference(categorical_features)

all_data = all_data.loc[:,cols]

您可以使用pandas.drop排除这些列：

all_data = all_data.drop(categorical_features,axis = 1)

请看以下示例作为测试：

import pandas as pd
import numpy as np

dates = pd.date_range('20130101',periods=6)

df = pd.DataFrame(np.random.randn(6,4),index = dates,columns = list('ABCD'))

print(df)

features = ['B','D']
df = df.drop(features,axis = 1)

print(df)

输出：

                   A         B         C         D
2013-01-01  1.365473 -0.445448  0.244377  0.416889
2013-01-02 -0.307532  0.095569  1.356229 -0.306618
2013-01-03  0.971216  1.100189  0.932189  0.808151
2013-01-04 -0.030160 -0.796742 -0.383336 -0.409233
2013-01-05  0.006601  0.093678 -1.013768  1.439921
2013-01-06  0.560771 -0.452491  1.050500 -1.545958
                   A         C
2013-01-01  1.365473  0.244377
2013-01-02 -0.307532  1.356229
2013-01-03  0.971216  0.932189
2013-01-04 -0.030160 -0.383336
2013-01-05  0.006601 -1.013768
2013-01-06  0.560771  1.050500