Python和线程：为什么在锁定区域之外提取计算会加快代码的速度？

如何解决Python和线程：为什么在锁定区域之外提取计算会加快代码的速度？

我正在为我的团队准备有关Python中的线程和锁定的会议，但遇到了一种我不完全了解的情况。

在下面的代码中，我正在使用Python（20个池）中的线程池计算许多（1000个）大字符串（100k个字符）的哈希。然后将散列存储在字典digest中，因此我在写字典时使用锁定（我认为可能实际上没有必要-但我们假设它是必需的，我们需要锁定）。

版本A）在lock语句中执行昂贵的哈希计算，版本B）在获取锁之前执行该操作，然后仅使用关键部分中的结果更新字典。

import threading
import time
from multiprocessing.pool import ThreadPool
import hashlib

# A) computation is within the lock statement
lock = threading.Lock()

digests = {}

def compute_digests(x):
  s = '*'*(x + 100000)  # generate some big string
  
  with lock:
    digests[x] = hashlib.sha256(f'{s}'.encode()).hexdigest()

tic = time.time()
ThreadPool(20).map(compute_digests,range(1000))
toc = time.time()
print(f'computation in locked area: {toc - tic}s')



# B) computation is outside of the lock statement
lock = threading.Lock()

digests = {}

def compute_digests(x):
  s = '*'*(x + 100000)  # generate some big string
  digest = hashlib.sha256(f'{s}'.encode()).hexdigest()
  
  with lock:
    digests[x] = digest

tic = time.time()
ThreadPool(20).map(compute_digests,range(1000))
toc = time.time()
print(f'computation outside of locked area: {toc - tic}s')

结果是：

computation in locked area: 0.41937875747680664s
computation outside of locked area: 0.10702204704284668s

换句话说，选项B）更快。考虑到我们将昂贵的计算移出了锁定的代码块之外，这似乎很直观，但是根据我的阅读，Python仍然是单线程，而ThreadPool仅给出了外观并行进行工作-而实际上在任何时候都只运行一次计算。换句话说，我希望Global Interpreter Lock成为瓶颈，但是不知何故，版本B有了实质性的提速！

问题是，加速来自何处？这与sha256的实现有关（也许睡在某个地方）？

解决方法

Python不是单线程的。它使用普通的系统线程，就像任何C ++或Java代码一样。区别在于global interpreter lock（GIL）可以由内部的C代码（例如hashlib）释放，而运行纯Python代码则强制一次执行单个线程。

在这种情况下，解释器可以自由地运行不同的代码，但是您强制不使用锁。