C++ 树遍历：为什么在这种情况下循环比递归慢？

如何解决C++ 树遍历：为什么在这种情况下循环比递归慢？

我需要在我的 C++ 代码中多次遍历一棵树，树的深度可能因一次迭代而异。我也可能有条件地从树遍历中提前中断。在分析我的代码（使用 Visual Studio 编译器）时，我注意到树遍历部分是我代码中最大的瓶颈，因此我需要尽可能加快该部分的速度。

下面是我的代码的描述和简化的可运行版本，以展示我目前遇到的问题。

在使用递归时，我注意到我可以通过有条件地从递归中提前中断来加速我的代码。但是，我对早期中断的实现根本没有提高速度（参见代码）。我认为通过使用循环而不是递归，提前破坏会更容易实现，所以我将我的树遍历转换为循环。令人惊讶的是，循环版本比递归版本慢了一个数量级！此外，early-break 最多只能提高 10% 的速度，这令人惊讶，因为这是深度优先搜索遍历，当发生 break 时，树的很大一部分没有被遍历。因此，预计至少会提速 50-100%。

我的问题：

为什么在循环版本下面的特定情况下是一个顺序速度更慢？！
为什么提前中断并没有显着提高速度（对于循环和递归）
对于以下案例的任何其他性能提示，我们深表感谢。

#include <iostream>
#include <vector>
#include <stack>
#include <chrono>

using namespace std;
using namespace std::chrono;

class Node {
public:
    int id;
    int left = -1;
    int right = -1;
    int count = 0;

    Node(int _id) { id = _id; }
};
std::vector<Node> nodes;

//1) recursive tree traversal
void recursive(int node) {
    if (nodes[node].left == -1) {
        nodes[node].count++;
    }
    else {
        recursive(nodes[node].right);
        recursive(nodes[node].left);
    }
}

//2) recursive tree traversal with conditional break
void recursive2(int node,bool* stop) {
    if (*stop == false) {
        if (nodes[node].left == -1) {
            nodes[node].count++;
            if (rand() % 2 == 0) { *stop = true; } //conditional break
        }
        else {
            recursive2(nodes[node].right,stop);
            if (*stop == false) {
                recursive2(nodes[node].left,stop);
            }
        }
    }
}

// loop traversal
void loop(int node) {
    stack<int> stack;
    stack.push(node);
    while (stack.size() > 0) {
        node = stack.top();
        stack.pop();
        if (nodes[node].left == -1) {
            nodes[node].count++;
            //if (rand() % 2 == 0) { break; } // conditional break
        }
        else {
            stack.push(nodes[node].right);
            stack.push(nodes[node].left);
        }
    }
}


int main()
{
    for (int i = 0; i < 7; i++) {
        nodes.push_back(Node(i));
    }
    // make a simple tree /node 6 is the root
    nodes[4].left = nodes[0].id;
    nodes[4].right = nodes[1].id;
    nodes[5].left = nodes[2].id;
    nodes[5].right = nodes[3].id;
    nodes[6].left = nodes[4].id;
    nodes[6].right = nodes[5].id;


    /// speed comparison 
    int n = 10000000;
    int root_node = 6;

    auto start = high_resolution_clock::Now();
    for (int i = 0; i < n; i++) { recursive(root_node); }
    auto stop = high_resolution_clock::Now();
    auto duration = duration_cast<milliseconds>(stop - start);
    cout << "recursion:" << duration.count() << endl;

    start = high_resolution_clock::Now();
    for (int i = 0; i < n; i++) {
        bool stop = false;
        recursive2(root_node,&stop);
    }
    stop = high_resolution_clock::Now();
    duration = duration_cast<milliseconds>(stop - start);
    cout << "recursion with early-break:" << duration.count() << endl;

    start = high_resolution_clock::Now();
    for (int i = 0; i < n; i++) { loop(root_node); }
    stop = high_resolution_clock::Now();
    duration = duration_cast<milliseconds>(stop - start);
    cout << "loop:" << duration.count() << endl;
}

解决方法

与您正在运行的迭代次数相比，您正在遍历的树是如此之小，以至于管理堆栈对象的动态内存使您认为的任何收益相形见绌（我们稍后会重新讨论）您正在循环中进行。

让我们尝试消除内存分配开销，看看速度会发生什么变化。

作为参考，这是我从您发布在我的本地 Visual Studio x64-Release 版本上的代码中得到的。

recursion:60
recursion with early-break:70
loop:2088

首先，std::stack 使用 std::deque，这不适用于小筹码。切换到向量支持的堆栈应该会让事情变得更好。既然是单行更改，至少没有理由尝试一下：

std::stack<int,std::vector<int>> stack;

recursion:58
recursion with early-break:68
loop:1853

确实如此！这不是惊天动地，但这是我们走在正确轨道上的好兆头。如果我们担心进行一次大遍历所需的时间，那可能就足够了。但是我们关心的是进行 10000000 微小的遍历，因此我们需要更进一步：完全摆脱内存分配：

// I "could" allocate the data as a local variable,// but then it would be on the stack,completely defeating the purpose.
// This would normally be a recyclable heap-based chunk of memory passed to
// the function.
std::array<int,50> g_stack; // just for the example,don't actually do this.
void loop(int node) {
  auto top = g_stack.begin();
  auto push = [&](int v) {*(top++) = v;};
  auto pop = [&]() {--top;};

  push(node);
  while (top != g_stack.begin()) {
    node = *(top-1);
    pop();
    if (nodes[node].left == -1) {
      nodes[node].count++;
    }
    else {
      push(nodes[node].right);
      push(nodes[node].left);
    }
  }
}

recursion:61
recursion with early-break:68
loop:65

现在我们在说话！但它仍然没有击败递归。这是怎么回事？

循环总是比使用递归更快的前提并不普遍正确。

基于循环的方法相对于递归的主要优势不是速度，而是它解决了递归的最大问题：堆栈空间用完的可能性。通过使用循环，您可以比递归函数调用更深入地“递归”，因为操作堆栈位于堆上，通常有更多可用空间。

有时编译器在循环代码方面比在递归方面做得更好，从而导致运行时间更快，但这从来都不是给定的。

C++ 树遍历：为什么在这种情况下循环比递归慢？

如何解决C++ 树遍历：为什么在这种情况下循环比递归慢？

解决方法

相关推荐