微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

错误:不能在类似字节的对象上使用字符串模式 情绪分析

如何解决错误:不能在类似字节的对象上使用字符串模式 情绪分析

我正在尝试使用正则表达式和许多其他内容删除 URL 以清理数据,为此我有以下功能

def depure(data):
  '''
  input : data 
  output: data without #URLs,Emails,Characters and single quotes
  '''
  #remove URLs with a regular expressions (not sure if they exist)
  regex = r'https?://\S+|www\.\S+'
  url_pattern = re.compile(regex)
  
  data = url_pattern.sub(r'',data)

  # Remove Emails
  data = re.sub('\S*@\S*\s?','',data)

  # Remove new line characters
  data = re.sub('\s+',' ',data)

  # Remove distracting single quotes
  data = re.sub("\'","",data)
          
  return data

但我不知道为什么,我已经尝试过解决这个错误,但什么也没有。

test_temp = []
#tranform data sequences to list
train_to_list = train_data.tolist()
test_to_list = test_data.tolist()

#for train data
for i in range(len(train_data)):
  train_temp.append(depure(train_data[i]))
train_words = list(sent_to_words(train_temp))
new_train = []
for i in range(len(train_words)):
  new_train.append(detokenize(train_data[i]))

输出错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-59-1e848ac6a862> in <module>()
      7 #for train data
      8 for i in range(len(train_data)):
----> 9   train_temp.append(depure(train_data[i]))
     10 train_words = list(sent_to_words(train_temp))
     11 new_train = []

1 frames
/usr/lib/python3.7/re.py in sub(pattern,repl,string,count,flags)
    192     a callable,it's passed the Match object and must return
    193     a replacement string to be used."""
--> 194     return _compile(pattern,flags).sub(repl,count)
    195 
    196 def subn(pattern,count=0,flags=0):

TypeError: cannot use a string pattern on a bytes-like object

有人可以帮我吗?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。