python中Word的分词和边界框

如何解决python中Word的分词和边界框

我正在研究一种语言的 OCR 并完成了字符识别的一部分，但我想通过分词来扩展它。我有字符分割的代码，结果如下所示，每个字符的边界框。 Character segmentation bounding box

import cv2
import numpy as np

# Load image,grayscale,Otsu's threshold 
image = cv2.imread('te.jpg')
original = image.copy()
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray,255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Find contours,obtain bounding Box,extract and save ROI
ROI_number = 0
cnts = cv2.findContours(thresh,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    cv2.rectangle(image,(x,y),(x + w,y + h),(36,12),2)
    ROI = original[y:y+h,x:x+w]
    cv2.imwrite('ROI_{}.png'.format(ROI_number),ROI)
    ROI_number += 1

cv2.imshow('image',image)
cv2.waitKey()

但我想要逐字而不是逐个字符的边界框。任何人都可以帮助我解决这个问题。