keras - AttributeError: 'numpy.ndarray' 对象没有属性 'lower'

如何解决keras - AttributeError: 'numpy.ndarray' 对象没有属性 'lower'

不确定我做错了什么：

我有一个数据集（所有文本），当我尝试拟合 Tokenizer 时它失败了。

from keras.preprocessing.text import Tokenizer as Tok

# The maximum number of words to be used. (most frequent)
MAX_NB_WORDS = 50000
# Max number of words in each complaint.
MAX_SEQUENCE_LENGTH = 250
# This is fixed.
EMbedDING_DIM = 100

labels = training_data['group_name']
features = training_data.drop('group_name',axis='columns')

tokenizer = Tok(num_words=MAX_NB_WORDS,filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~',lower=True)
tokenizer.fit_on_texts(features.values)

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

我已按如下方式清理文本：

REPLACE_BY_SPACE_RE = re.compile('[/(){}\[\]\|@,;]')
BAD_SYMBOLS_RE = re.compile('[^0-9a-z #+_]')
STOPWORDS = set(stopwords.words('english'))


def clean_text(text):   
    text = text.lower() # lowercase text
    text = REPLACE_BY_SPACE_RE.sub(' ',text) # replace REPLACE_BY_SPACE_RE symbols by space in text. substitute the matched string in REPLACE_BY_SPACE_RE with space.
    text = BAD_SYMBOLS_RE.sub('',text) # remove symbols which are in BAD_SYMBOLS_RE from text. substitute the matched string in BAD_SYMBOLS_RE with nothing. 
    text = text.replace('x','')

    text = ' '.join(word for word in text.split() if word not in STOPWORDS) # remove stopwors from text
    return text

print('clean text')
training_data = training_data.applymap(lambda x: clean_text(x))

...所以我看不到 numpy.ndarray 来自哪里

更新：

我能够理解问题并以一种丑陋的方式解决它：

将所有列合并为一个：

labels = df['group_name']
features = df.drop('group_name',axis='columns')
tmp = pd.DataFrame()
tmp['txt'] = features[features.columns[1:]].apply(
    lambda x: ','.join(x.dropna().astype(str)),axis=1
)

现在它通过了有问题的步骤：

tokenizer = Tok(num_words=MAX_NB_WORDS,lower=True)
tokenizer.fit_on_texts(tmp['txt'].values)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X = tokenizer.texts_to_sequences(tmp['txt'].values)
X = pad_sequences(X,maxlen=MAX_SEQUENCE_LENGTH)
print('Shape of data tensor:',X.shape)
Y = pd.get_dummies(labels).values
print('Shape of label tensor:',Y.shape)

不过我还是想保留原来的列，不要把所有的数据都放在一个列中。

我怎样才能做到这一点（即从所有列中获取所有值而不迭代嵌套的 numpy 数组）？