如何解决快速排序线性时间?
我在做quicksort(qsort from c++ STL)算法的分析,代码是:
#include <iostream>
#include <fstream>
#include <ctime>
#include <bits/stdc++.h>
#include <cstdlib>
#include <iomanip>
#define MIN_ARRAY 256000
#define MAX_ARRAY 1000000000
#define MAX_RUNS 100
using namespace std;
int* random_array(int size) {
int* array = new int[size];
for (int c = 0; c < size; c++) {
array[c] = rand()*rand() % 1000000;
}
return array;
}
int compare(const void* a,const void* b) {
return (*(int*)a - *(int*)b);
}
int main()
{
ofstream fout;
fout.open("data.csv");
fout << "array size,";
srand(time(NULL));
int size;
int counter = 1;
std::clock_t start;
double duration;
for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
fout << size << ",";
}
fout << "\n";
for (counter = 1; counter <= MAX_RUNS; counter++) {
fout << "run " << counter << ",";
for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
try {
int* arr = random_array(size);
start = std::clock();
qsort(arr,size,sizeof(int),compare);
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
//cout << "size: " << size << " duration: " << duration << '\n';
fout << setprecision(15) << duration << ",";
delete[] arr;
}
catch (bad_alloc) {
cout << "bad alloc caught,size: " << size << "\n";
fout << "bad alloc,";
}
}
fout << "\n";
cout << counter << "% done\n";
}
fout.close();
return 0;
}
当我运行这个时,数据完全线性返回:
这到底是怎么回事?快速排序不是 O(nlogn) 吗?
以下是使用的数组大小以及所有 100 次运行中每种大小的平均时间(以秒为单位):
arraysize,256000,512000,1024000,2048000,4096000,8192000,16384000,32768000,65536000,131072000,262144000,524288000
average,0.034,0.066,0.132,0.266,0.534,1.048,2.047,4.023,7.951,15.833,31.442
解决方法
平均而言,确实是O(N log N)。
只是 f(N) = N log(N) 的图看起来非常线性。
绘制它并亲自查看,或参考下面的一个。这个平均时间使算法如此聪明:
,斜率看起来呈线性的部分原因是 Log(N) 变化缓慢,但主要原因是填充数组的随机数限于 [0-1,000,000)。这导致大数组大部分被重复填充,并且随着 qsort 算法缩小到较小的组,排序变得更快。当数组大小从 10,000 增加到 20,000 时,重复项的平均数量会增加一倍,因此排序轨迹几乎完全呈线性。
这可以从下图看出:
橙色和灰色线是无约束和约束数组的执行时间。黄线和蓝线从 0 到两次运行的终点是线性的。一次运行将原始代码中的整数限制为 [0-1000000)。另一个不受限制为 2^31 个正整数。请注意无约束排序需要多长时间,因为对增加的重复组进行排序非常快。
这里的代码修改显示无约束执行时间具有明显的曲线,正如人们对 NLogN 所期望的那样。
#include <iostream>
#include <fstream>
#include <ctime>
#include <cstdlib>
#include <iomanip>
#define MIN_ARRAY 256000
#define MAX_ARRAY 1000000000
#define MAX_RUNS 100
using namespace std;
int* random_array(int size) {
int* array = new int[size];
for (int c = 0; c < size; c++) {
// array[c] = rand() * rand() % 1000000;
// Note that as the array size grows beyond 1000000
// this will produce increasing numbers of duplicates
// which will shorten the time when the subsets get small
array[c] = (rand() << 16) | (rand() << 1) | (rand() & 1);
// Note that in this example/system,RAND_MAX==0x7fff
// get a random positive int distributed in the set of positive,32 bit ints
}
return array;
}
int compare(const void* a,const void* b) {
return (*(int*)a - *(int*)b);
}
int main()
{
auto x = RAND_MAX;
ofstream fout;
fout.open("data.csv");
fout << "array size,";
srand(time(NULL));
int size;
int counter = 1;
std::clock_t start;
double duration;
for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
fout << size << ",";
}
fout << "\n";
for (counter = 1; counter <= MAX_RUNS; counter++) {
fout << "run " << counter << ",";
for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
try {
int* arr = random_array(size);
start = std::clock();
qsort(arr,size,sizeof(int),compare);
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
cout << "size: " << size << " duration: " << duration << '\n';
fout << setprecision(15) << duration << ",";
delete[] arr;
}
catch (bad_alloc) {
cout << "bad alloc caught,size: " << size << "\n";
fout << "bad alloc,";
}
}
fout << "\n";
cout << counter << "% done\n";
}
fout.close();
return 0;
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。