快速排序线性时间？

如何解决快速排序线性时间？

我在做quicksort（qsort from c++ STL）算法的分析，代码是：

#include <iostream>
#include <fstream>
#include <ctime>
#include <bits/stdc++.h>
#include <cstdlib>
#include <iomanip>

#define MIN_ARRAY 256000
#define MAX_ARRAY 1000000000
#define MAX_RUNS 100

using namespace std;

int* random_array(int size) {
    int* array = new int[size];

    for (int c = 0; c < size; c++) {
        array[c] = rand()*rand() % 1000000;
    }

    return array;
}

int compare(const void* a,const void* b) { 
    return (*(int*)a - *(int*)b); 
}

int main()
{
    ofstream fout;
    fout.open("data.csv");
    fout << "array size,";
    srand(time(NULL));
    int size;
    int counter = 1;

    std::clock_t start;
    double duration;

    for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
        fout << size << ",";
    }
    fout << "\n";

    for (counter = 1; counter <= MAX_RUNS; counter++) {
        fout << "run " << counter << ",";
        for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
            try {
                int* arr = random_array(size);

                start = std::clock();
                qsort(arr,size,sizeof(int),compare);
                duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;

                //cout << "size: " << size << " duration: " << duration << '\n';
                fout << setprecision(15) << duration << ",";

                delete[] arr;
            }
            catch (bad_alloc) {
                cout << "bad alloc caught,size: " << size << "\n";
                fout << "bad alloc,";
            }

        }
        fout << "\n";
        cout << counter << "% done\n";
    }
    
    fout.close();
    return 0;
}

当我运行这个时，数据完全线性返回：

data

这到底是怎么回事？快速排序不是 O(nlogn) 吗？

以下是使用的数组大小以及所有 100 次运行中每种大小的平均时间（以秒为单位）：

arraysize,256000,512000,1024000,2048000,4096000,8192000,16384000,32768000,65536000,131072000,262144000,524288000
average,0.034,0.066,0.132,0.266,0.534,1.048,2.047,4.023,7.951,15.833,31.442

解决方法

平均而言，确实是O(N log N)。

只是 f(N) = N log(N) 的图看起来非常线性。

绘制它并亲自查看，或参考下面的一个。这个平均时间使算法如此聪明：

斜率看起来呈线性的部分原因是 Log(N) 变化缓慢，但主要原因是填充数组的随机数限于 [0-1,000,000)。这导致大数组大部分被重复填充，并且随着 qsort 算法缩小到较小的组，排序变得更快。当数组大小从 10,000 增加到 20,000 时，重复项的平均数量会增加一倍，因此排序轨迹几乎完全呈线性。

这可以从下图看出：

橙色和灰色线是无约束和约束数组的执行时间。黄线和蓝线从 0 到两次运行的终点是线性的。一次运行将原始代码中的整数限制为 [0-1000000)。另一个不受限制为 2^31 个正整数。请注意无约束排序需要多长时间，因为对增加的重复组进行排序非常快。

这里的代码修改显示无约束执行时间具有明显的曲线，正如人们对 NLogN 所期望的那样。

#include <iostream>
#include <fstream>
#include <ctime>
#include <cstdlib>
#include <iomanip>

#define MIN_ARRAY 256000
#define MAX_ARRAY 1000000000
#define MAX_RUNS 100

using namespace std;

int* random_array(int size) {
    int* array = new int[size];

    for (int c = 0; c < size; c++) {
        // array[c] = rand() * rand() % 1000000;
            // Note that as the array size grows beyond 1000000
            // this will produce increasing numbers of duplicates
            // which will shorten the time when the subsets get small
    
        array[c] = (rand() << 16) | (rand() << 1) | (rand() & 1);
            // Note that in this example/system,RAND_MAX==0x7fff
            // get a random positive int distributed in the set of positive,32 bit ints
    }

    return array;
}

int compare(const void* a,const void* b) {
    return (*(int*)a - *(int*)b);
}

int main()
{
    auto x = RAND_MAX;
    ofstream fout;
    fout.open("data.csv");
    fout << "array size,";
    srand(time(NULL));
    int size;
    int counter = 1;

    std::clock_t start;
    double duration;

    for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
        fout << size << ",";
    }
    fout << "\n";

    for (counter = 1; counter <= MAX_RUNS; counter++) {
        fout << "run " << counter << ",";
        for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
            try {
                int* arr = random_array(size);

                start = std::clock();
                qsort(arr,size,sizeof(int),compare);
                duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;

                cout << "size: " << size << " duration: " << duration << '\n';
                fout << setprecision(15) << duration << ",";

                delete[] arr;
            }
            catch (bad_alloc) {
                cout << "bad alloc caught,size: " << size << "\n";
                fout << "bad alloc,";
            }

        }
        fout << "\n";
        cout << counter << "% done\n";
    }

    fout.close();
    return 0;
}

快速排序线性时间？

如何解决快速排序线性时间？

解决方法

相关推荐