如何解决如何以正确的方式编写字典文件? 霍夫曼编码器
我正在研究c ++ Huffman编码器。主要功能有两个:压缩和解压缩。
“压缩”功能读取输入文件,计数频率,生成霍夫曼树,将字典写入单独的文件,对输入文件进行编码并将其保存到输出文件。
“解压缩”功能读取编码的文件,读取字典文件,并用适当的字符替换代码。
一个问题:有时会有\ n,\ t,\ r等字符,并且字典中的coource字符不会出现。插入后,这些字符的动作仅在字典文件中执行,看起来像这样:
я011000010
110011 // here it should be a newline character,not a newline itself!
110100
111 // here it should be a space character,not a space itself!
'00010100,100110
.11001000
:011000011
;011000100
?011000101
A11001001
B00010101
I11001010
S00010110
T11001011
W00010111
Y011000110
a10110
b000100
c110101
d10010
e010
例如,由于第6行不被视为“ [space]:111”,而是被视为“ 1:11”,从而导致解码错误。
UPD:代码
// standard libraries
#include <iostream> // input / output operations
#include <fstream> // operatioons with file streams
#include <string> // char strings
#include <vector> //operations with vectors
#include <map> //operations with maps
#include <list> //operations with lists
// my own header files
#include "functions.h" // declarations of functions
/*This is for convertin string to vector
(I need it in function DECOMPRESS)*/
auto stringToVector(string&& s) {
vector<bool> v;//vector we want to get after converting
for (auto i : s) //for char in string
v.push_back(i == '1');//add it to a vector as a separate element
return v;//Our final vector!
}
vector<bool> code;// Binary codes of chars
map<char,vector<bool> > dict;// associative array of charater and its binary code
Node* root;// pointer at the top of the tree
list<Node*> nodePointersList;//list of pointers at nodes(which have freq and char taken from the map of frequencies)
//Create pairs "Character-Code" function
void DictionaryGenerator(Node* root)
{
if (root->left != NULL)//If it is on the left side
{
code.push_back(0);//Then we write '0'
DictionaryGenerator(root->left);//Again - start recursion function BuildTable but Now for left child
}
if (root->right != NULL)//If it is on the right side
{
code.push_back(1);//Then we write '1'
DictionaryGenerator(root->right);//Again - start recursion function BuildTable but Now for right child
}
/*If we reached the character we have to assign a binary code to it:*/
if (root->left == NULL && root->right == NULL) dict[root->character] = code;
if (!code.empty()) {
code.pop_back();//code -1
}
//After this func execution I'll write the dict table to a file
}
void compress(string In,string Out,string Dict)
{
/*=======================================================================================
* Here we count the frequency of each char in file using associative array "MAP"
*=====================================================================================*/
ifstream inFile(In,ios::in | ios::binary);//Open file for binary reading
map <char,int> freqTable;//associative array where 1st element is a character,2nd - its frequency
while (!inFile.eof())//While Not end of file
{
char characterToRead = inFile.get();//reading file with spaces
freqTable[characterToRead]++;//add 1 char to frequencies table
//It just assigns and increments the counter to the found char
}
/*=======================================================================================
* Here we write start nodes to our list
*=====================================================================================*/
/**From begin of Map to the end of MaP*/
for (map <char,int>::iterator itr = freqTable.begin(); itr != freqTable.end(); ++itr)
{
/*Put elements FROM map INTO our list as nodes*/
Node* p = new Node;//Create pointer at node
p->character = itr->first;//Field of class NODE "character" becomes first
p->freq = itr->second;//Field of class NODE "frequency" becomes second
nodePointersList.push_back(p);//add this pointer (which include char & frequency) to our list
}
/*=======================================================================================
* Here we create the Huffman's tree
*=====================================================================================*/
while (nodePointersList.size() != 1)//While in our list is NOT 1 element
{
/*The last iteration of loop will give us one single element -
root of the tree*/
nodePointersList.sort(compare());//sort list using compare structure
/*We need 2 nodes to sum them:*/
Node* leftChild = nodePointersList.front();//GET THE FirsT ELEMENT from the beginning of nodePointersList
nodePointersList.pop_front();//Delete copied element from nodePointersList
Node* rightChild = nodePointersList.front();//GET THE NEXT ELEMENT from the beginning of nodePointersList
nodePointersList.pop_front();//Delete copied element from nodePointersList
/*Here we sum those 2 elements:*/
Node* Parent = new Node(leftChild,rightChild);//Create a parent (childrens' frequencies will be added)
nodePointersList.push_back(Parent); //and put it on the list as a new node.
//Now go to sort again
//until we have just one root element
}
/*Here we create a variable for that root. It is a pointer at the top of the tree (Node* root) */
root = nodePointersList.front();
/*==================== End of creating the Huffman's tree ============================*/
/*=======================================================================================
* Here we write our dictionary to the dictionary file
*=====================================================================================*/
//First Create a dictionary
DictionaryGenerator(root);//Create pairs "Character-Code". Other words - Huffman Dictionary
//Then write it to a file
ofstream DictFile(Dict,ios::out | ios::binary);//open file in binary mode
DictFile.clear();
//cout << "Dictionary size: " << dict.size() << endl;
/* --------------------------- TYPE---ITER---FROM BEGIN--------TO_END-----INCREMENT ITR*/
/* first second */
for (map<char,vector<bool> >::iterator ii = dict.begin(); ii != dict.end(); ++ii) {
/* (*ii) is an iterator; .first means char*/
DictFile << (*ii).first; //Prints ii-th char from the map
/* (*ii) is an iterator; .second means vector<bool>*/
vector <bool> inVect = (*ii).second; //assingns ii-th vector from map to a local vector
for (unsigned j = 0; j < inVect.size(); j++) {// Prints vector for the current ii-th char
DictFile << inVect[j];
}
DictFile << endl;
}
DictFile.close();// Close the DICTIONARY file
/*=======================================================================================
* Here we create & write our binary (Compressed) code to Output file
*=====================================================================================*/
inFile.clear();
inFile.seekg(0); //switch our cursor to the beginning of the INPUT file (to read it from the beginning)
ofstream outFile(Out,ios::out | ios::binary);//open OUTPUT file for writting in binary mode
int count = 0; // Counter in range 0-8 to count when the new byte starts
char buf = 0; // Here we store our 8 bits which were set during 1 counter loop (0-8)
while (!inFile.eof()) // While not end of INPUT file (We read the input file)
{
char c = inFile.get(); //Take char from the file
vector<bool> x = dict[c]; //look for its binary code in the dictionary
/*This is how I “pack” my stream of zeros and ones into bytes*/
for (int n = 0; n < x.size(); n++)
{
/* Write code into the Buffer variable. At the beginning buf=00000000 and then with
each new iteration we copy the value from the dictionary to our buffer (using binary
addition)*/
/* << means SHIFT element by (7-count). We do that in oder to
read bits correctly left to right*/
buf = buf | x[n] << (7 - count);// '|' means binary addition; x[n] is our current bit
count++;
/* if we passed 8 bit,that means we got 1 byte. Than we write that obtained byte
from the buffer byte (buf) to the Output File. */
if (count == 8) { count = 0; outFile << buf; buf = 0; }
/*In the end we will get approximately 50% lighter file with non-sence characters.
By the way,this file may not even have the extension! It can be either BIN or TXT,doesn't matter. */
}
}
/*Close all of the files (DICTIONARY file is already closed) */
inFile.close();// Close the INPUT file
outFile.close();// Close the OUTPUT file
}
void decompress(string In,string Dict)
{
/*=======================================================================================
* Translate the dictionary file to a map <char : vector<bool> >
*=====================================================================================*/
ifstream Dictionary(Dict,ios::in | ios::binary);
string str;// variable for each new line of text
map<char,vector<bool>> dict;//associative array of charater and its binary code (Dictionary)
/*Here we read string and translate it to a vector<bool> for bin codes*/
for (string str; Dictionary >> str;)
{
if (str.length() > 1)
{
dict[str[0]] = stringToVector(str.substr(1));
}
}
/*This is to print the dictionary file in another way*/
for (map<char,vector<bool> >::iterator ii = dict.begin(); ii != dict.end(); ++ii) {
/* (*ii) is an iterator; .first means char*/
cout << (*ii).first; //Prints ii-th char from the map
/* (*ii) is an iterator; .second means vector<bool>*/
vector <bool> inVect = (*ii).second; //assingns ii-th vector from map to a local vector
for (unsigned j = 0; j < inVect.size(); j++) {// Prints vector for the current ii-th char
cout << inVect[j];
}
cout << endl;
}
Dictionary.close();
/*=======================================================================================
* Translate the input file to a vector<bool>
*=====================================================================================*/
ifstream inFile(In,ios::in | ios::binary);
char c;
bool currentBit;
vector<bool> sourceCode;
while (inFile.get(c))
{
for (int i = 7; i >= 0; i--)
{
currentBit = ((c >> i) & 1);
sourceCode.push_back(currentBit);
cout << currentBit;
}
}
inFile.close();
/*=======================================================================================
* Todo: Decode & output to a file
*=====================================================================================*/
ofstream outFile(Out,ios::out | ios::binary);//open OUTPUT file for writting in binary mode
//Vector which includes the part of encoded text we want to check at this iteration
vector<bool> currentCheck;
cout << endl << endl;
/*Here we iterate through a vector of encoded text*/
for (std::vector<bool>::iterator it = sourceCode.begin(); it != sourceCode.end(); ++it) {
currentCheck.push_back(sourceCode.front() + (*it));/*At first itr has to be just one 1st bool from the text*/
/*Here we iterate through the map
We want to find the same code as our currentCheck in the maP*/
for (map<char,vector<bool> >::iterator ii = dict.begin(); ii != dict.end(); ++ii) {
//is it legal to compare vectors of bools in such a way?? UPD: Yes
if (currentCheck == (*ii).second)//If we found the same code in the dictionary map
{
outFile << (*ii).first;//then print this character
//And we need to continue our checking from that new position
currentCheck.clear();
}
}
}
outFile.close();// Close the OUTPUT file
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。