在没有挥发的DCL单例中，是否可以优化NULL检查？

如何解决在没有挥发的DCL单例中，是否可以优化NULL检查？

一个简单的DCL单例：

class Singleton {
 public:
  Singleton* GetInstance();

 private:
  Singleton() = default;

  static Singleton* s_instance;
  static std::mutex s_mutex;
};

Singleton* Singleton::GetInstance() {
  if (s_instance == nullptr) { // 1st check
    std::lock_guard<std::mutex> lock(s_mutex);
    if (s_instance == nullptr) { // 2nd (double) check
      s_instance = new Singleton();
    }
  }
  return s_instance;
}

这是我的问题：诸如this之类的论文广泛讨论了编译器的优化，这可能导致内存/指令重新排序，从而导致任何线程崩溃。

但是我没有发现如果s_instance不稳定的话，关于NULL检查的文章可能会被优化。 由于没有文章谈论它，我需要有人来验证这是否有可能。

我的意思是编译器将代码转换为如下形式：

class Singleton {
 public:
  Singleton* GetInstance();

 private:
  Singleton() = default;

  static Singleton* s_instance;
  static std::mutex s_mutex;
};

Singleton* Singleton::GetInstance() {
  if (s_instance == nullptr) { // 1st check
    std::lock_guard<std::mutex> lock(s_mutex);
    **// 2nd (double) check optimized out**
    s_instance = new Singleton();
  }
  return s_instance;
}

解决方法

编译器可能执行的优化因编译器的不同而异，并基于优化标志；没有“一个真实的答案”，因此您必须始终为自己的代码进行独立验证。不同的编译器具有不同的优化，并且某些标志或内在函数将改变优化器查看程序集的方式。

只要被检查的代码没有任何未定义的行为，确定某些内容是否得到优化的最佳方法就是检查程序集。如果代码样本足够小，则可以使用一个简单的工具Compiler Explorer。

使用gcc-10.2和-O3测试示例表明，没有优化的检查。

对于经过仔细检查的代码，我们看到程序集包含：

        mov     rax,QWORD PTR Singleton::s_instance[rip]
        test    rax,rax                                  ; This is for the first test
        je      .L25                                      ; branch on the results
        ret
.L25:
        ...
        call    __gthrw_pthread_mutex_lock(pthread_mutex_t*) ; acquire lock
        ...
        mov     rax,rax                                  ; The second test
        je      .L6                                       ; branch on the results

因此，在初始化期间，将同时进行两项检查-而第一次检查将在每次输入GetInstance()时进行。

我相信第二张支票无法在此处进行优化，其原因有两个：

C ++语言必须假定__gthrw_pthread_mutex_lock可以访问并使用别名s_instance，这意味着编译器必须假定可以对其进行修改。这将强制从主内存中进行新的查找，这也需要进行新的检查。
编译器可能知道__gthrw_pthread_mutex_lock会导致一个同步点，这将更改此线程的数据视图。同步点还强制从主内存重新加载数据，而不是依赖处理器缓存。同样，这将需要进行新的检查，而无需进行任何假设

正如mpoeter在评论中指出的那样，值得注意的是，如果在线程上下文中，与m_instance的非原子比较实际上是未定义的行为。分析具有不确定行为的程序生成的程序集是没有意义的，因为编译器可以自由控制其生成的内容（如果编译器完全为UB生成程序集）。

由于您只是尝试创建单例的线程安全初始化，因此实际上您可以使用c++11函数范围内的静态变量安全地进行此操作，这些变量保证了对于初始化是线程安全的。

您的代码可以简单地重写为：

class Singleton {
public:
  Singleton* GetInstance();

private:
  Singleton() = default;

};

Singleton* Singleton::GetInstance() {
  // Initialized exactly once,in a thread-safe way
  static auto s_instance = new Singleton();

  return s_instance;
}

See the assembly comparison here

值得注意的是，作用域范围的静态变量也同样通过双重检查模式进行了初始化，但是会产生使用内部函数而不是显式syscall的程序集。如果您查看上面的链接，则会看到此初始化变为：

Singleton::GetInstance():
        movzx   eax,BYTE PTR ; guard variable for Singleton::GetInstance()::s_instance[rip]
        test    al,al        ; first test for initialization
        je      .L16
        ...
.L16:
        push    rbp
        mov     edi,OFFSET FLAT     ; guard variable for Singleton::GetInstance()::s_instance
        call    __cxa_guard_acquire  ; acquire exclusive lock
        test    eax,eax             ; second test,after locking segment
        jne     .L17
        ...

在任何一种情况下，程序集中的第二个检查 existing 并不意味着它将在每次调用时被触发。由于初始化仅在第一次输入时发生，因此很少执行此分支。