微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

lzma totalread比标头未压缩的大小大1

如何解决lzma totalread比标头未压缩的大小大1

我正在尝试使用easylzma库用lzma解压缩文件,某些文件可以正常工作,但是随机文件不能解压缩。
经过一些调试后,我发现totalread number比标头uncompressedSize大1,而且标头流为0。
代码说没有页脚,但是当我从总读取中减去1以跳过错误时,文件将正确解压缩,但在文件末尾添加一行,该行有多个字段为0,单个字段为value。 br /> 这些文件是来自dukascopy的.bi5。
我想确定错误是由于我使用的库中的逻辑错误引起的,还是文件损坏,在这种情况下应执行的操作。
使用的库是来自github的easylzma-master和dukascopy-master,文件是从dukascopy服务器下载的。
确切地说,2020年9月30日“ 9月8日”的文件13h_ticks.bi5和21_ticks.bi5显示了此问题。

更新:
我没有放代码,因为我现在正在询问准则,代码存在并且显示了问题。但是它是库代码。所以我想知道是否有人对dukascopy bi5类型的特定文件和此lzma有相同的问题我现在正在寻找“当在lzma解压缩中时,我们得到的行为是totalread重复大于标头未压缩大小1的行为吗?”,这意味着存在页脚,但未在标头字节中提及? “

更新:
这就是我打开文件的方式

int HTTPRequest::read_bi5_main(boost::filesystem::path p,ptime epoch)
{
    boost::unique_lock<boost::mutex> read_bi5_to_bin_lock(mBOOST_LOGMutex,boost::defer_lock);
    boost::unique_lock<boost::mutex> read_bi5_to_bin_lock2(m_read_bi5_to_binMutex,boost::defer_lock);

    unsigned char *buffer;
    size_t buffer_size;

    int counter;

    size_t raw_size = 0;

    std::string filename_string = p.generic_string();
    path p2 = p;
    p2.replace_extension(".bin");
    std::string filename_string_to_bin =p2.generic_string() ;

    path p3 = p;
    p3.replace_extension(".csv");
    std::string filename_string_to_csv = p3.generic_string();

    const char *filename = filename_string.c_str();
    const char *filename_to_bin = filename_string_to_bin.c_str();
    const char *filename_to_csv = filename_string_to_csv.c_str();

    //22-9-2020 here I open the downloaded file if possible
    if (fs::exists(p) && fs::is_regular(p))
    {
        buffer_size = fs::file_size(p);
        buffer = new unsigned char[buffer_size];
    }
    else {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Error: Couldn't access the data file. |"
            << filename << "|" << std::endl;
        read_bi5_to_bin_lock.unlock();
        return 2;
    }

    //22-9-2020 here I read the downloaded file into filestream
    std::ifstream fin(filename,std::ifstream::binary);
    fin.read(reinterpret_cast<char*>(buffer),buffer_size);
    fin.close();

    //22-9-2020 here I check if file is related to japanese yen so that I determine how to write its value
    /*
    if symbols_xxx has mHTTPRequest_Symbol_str then PV=0.001
    else if symbols_xxxx has mHTTPRequest_Symbol_str then PV=0.0001
    else if symbols_xxxx has mHTTPRequest_Symbol_str then PV=0.00001
    */
    //28-9-2020 I will make 3 vectors in utils.h for 3,4,5 point value,then I find symbol in vector,//std::size_t pos = mHTTPRequest_Symbol_str.find("JPY");

    double PV;

    std::vector<std::string>::iterator it3 = std::find(point_value_xxx.begin(),point_value_xxx.end(),mHTTPRequest_Symbol_str);

    std::vector<std::string>::iterator it4 = std::find(point_value_xxxx.begin(),point_value_xxxx.end(),mHTTPRequest_Symbol_str);

    std::vector<std::string>::iterator it5 = std::find(point_value_xxxxx.begin(),point_value_xxxxx.end(),mHTTPRequest_Symbol_str);
    if (it3 != point_value_xxx.end())
    {
        PV = 0.001;
    }
    else if (it4 != point_value_xxxx.end())
    {
        PV = 0.0001;
    }
    else if (it5 != point_value_xxxxx.end())
    {
        PV = 0.00001;
    }
    else
    {
        //10-1-2020throw;
        PV = 0.001;

    }
    read_bi5_to_bin_lock2.lock();
    unsigned char *data_bin_buffer = 0 ;
    n47::tick_data *data = n47::read_bi5_to_bin(
            buffer,buffer_size,epoch,PV,&raw_size,&data_bin_buffer);

    //5-11-2020 here i will save binary file
    std::string file_name_path_string=output_compressed_file_2(&data_bin_buffer,raw_size,filename_to_bin);
    read_bi5_to_bin_lock2.unlock();

    path file_name_path_2{ file_name_path_string };
    buffer_size = 0;
    if (fs::exists(file_name_path_2) && fs::is_regular(file_name_path_2))
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << boost::this_thread::get_id() <<"\t we can access the data .bin file. |"
            << filename_to_bin << "| with size ="<< fs::file_size(file_name_path_2) << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    else {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Error: Couldn't access the data .bin file. |"
            << filename_to_bin << "|" << std::endl;
        read_bi5_to_bin_lock.unlock();
        return 2;
    }

    n47::tick_data_iterator iter;

    //5-11-2020 here i will save file.csv from data which is pointer to vector to pointers to ticks
    if (data == 0)
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Failure: Failed to load the data!" << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    //5-15-2020 take care that without else,error happens with empty files because data is pointer to vector of pointers to ticks .so when data is made inside read_bi5,it is made as null pointer and later it is assigned to vector if file has ticks.if file does not have ticks,then it is just returned as null pointer .so when dereferencing null pointer we got error
    else if (data->size() != (raw_size / n47::ROW_SIZE))
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Failure: Loaded " << data->size()
            << " ticks but file size indicates we should have loaded "
            << (raw_size / n47::ROW_SIZE) << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    //22-9-2020 in last if and if else I checked if file is either empty or has error of data size So Now I have good clean file to work with
    //read_bi5_to_bin_lock.lock();
    //BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "time,bid,bid_vol,ask,ask_vol" << std::endl;
    //read_bi5_to_bin_lock.unlock();

    counter = 0;

    std::ofstream out_csv(filename_string_to_csv);
    if (data == 0)
    {

    }
    else if (data != 0)
    {
        for (iter = data->begin(); iter != data->end(); iter++) {
            //5-11-2020 here i will save file.csv from data which is pointer to vector to pointers to ticks>>>>>>>here i should open file stream for output and save data to it
            out_csv
            //<< std::setfill('0')<<std::setw(sizeof((*iter)->epoch + (*iter)->td))<<std::fixed<<((*iter)->epoch + (*iter)->td) << ","
            //<< std::setfill('0')<<std::setw(27)<<std::fixed<<((*iter)->epoch + (*iter)->td) << ","
            << std::setfill('0')<<((*iter)->epoch + (*iter)->td) << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->bid)<<std::fixed << (*iter)->bid << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->bidv)<<std::fixed << (*iter)->bidv << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->ask)<<std::fixed << (*iter)->ask << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->askv)<<std::fixed << (*iter)->askv << std::endl;
            //??5-17-2020 isolate multithreaded error
            /*
            read_bi5_to_bin_lock.lock();
            BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) <<
                boost::this_thread::get_id() << "\t"<<((*iter)->epoch + (*iter)->td) << ","
                << (*iter)->bid << "," << (*iter)->bidv << ","
                << (*iter)->ask << "," << (*iter)->askv << std::endl;
            BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) <<
                            boost::this_thread::get_id() << "\t"<< std::setfill('0')<< std::setw(sizeof((*iter)->epoch + (*iter)->td))<<((*iter)->epoch + (*iter)->td) << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->bid)<< (*iter)->bid << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->bidv)<< (*iter)->bidv << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->ask)<< (*iter)->ask << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->askv)<< (*iter)->askv << std::endl;
            read_bi5_to_bin_lock.unlock();
            */
            counter++;
        }
        ////read_bi5_to_bin_lock.unlock();

    }
    out_csv.close();
    //5-13-2020

    //??5-17-2020 isolate multithreaded error
    read_bi5_to_bin_lock.lock();

    BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << ".end." << std::endl << std::endl
        << "From " << raw_size << " bytes we read " << counter
        << " records." << std::endl
        << raw_size << " / " << n47::ROW_SIZE << " = "
        << (raw_size / n47::ROW_SIZE) << std::endl;
    read_bi5_to_bin_lock.unlock();


    delete data;
    delete[] buffer;
    delete [] data_bin_buffer;
    return 0;
}

这是我的杜高斯贝修改文件

//#include "stdafx.h"

/*
copyright 2013 Michael O'Keeffe (a.k.a. ninety47).

This file is part of ninety47 Dukascopy toolBox.

The "ninety47 Dukascopy toolBox" is free software: you can redistribute it
and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation,either version 3 of the License,or any later version.

"ninety47 Dukascopy toolBox" is distributed in the hope that it will be
useful,but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or fitness FOR A PARTIculaR PURPOSE.  See the GNU General
Public License for more details.

You should have received a copy of the GNU General Public License along with
"ninety47 Dukascopy toolBox".  If not,see <http://www.gnu.org/licenses/>.
*/

#include "ninety47/dukascopy.h"
#include <boost/date_time/posix_time/posix_time.hpp>
#include <algorithm>
#include <vector>
#include "ninety47/dukascopy/defs.h"
#include "ninety47/dukascopy/io.hpp"
#include "ninety47/dukascopy/lzma.h"



namespace n47 {

namespace pt = boost::posix_time;


tick *tickFromBuffer(
        unsigned char *buffer,pt::ptime epoch,float digits,size_t offset) {
    bytesTo<unsigned int,n47::BigEndian> bytesTo_unsigned;
    bytesTo<float,n47::BigEndian> bytesTo_float;

    unsigned int ts = bytesTo_unsigned(buffer + offset);
    pt::time_duration ms = pt::millisec(ts);
    unsigned int ofs = offset + sizeof(ts);
    float ask = bytesTo_unsigned(buffer + ofs) * digits;
    ofs += sizeof(ts);
    float bid = bytesTo_unsigned(buffer + ofs) * digits;
    ofs += sizeof(ts);
    //28-9-2020 convert volume to million
    float askv = bytesTo_float(buffer + ofs) *1000000;
    ofs += sizeof(ts);
    float bidv = bytesTo_float(buffer + ofs) *1000000;

    return new tick(epoch,ms,askv,bidv);
}


tick_data* read_bin(
        unsigned char *buffer,size_t buffer_size,float point_value) {
    std::vector<tick*> *data = new std::vector<tick*>();
    std::vector<tick*>::iterator iter;

    std::size_t offset = 0;

    while ( offset < buffer_size ) {
        data->push_back(tickFromBuffer(buffer,point_value,offset));
        offset += ROW_SIZE;
    }

    return data;
}


tick_data* read_bi5(
        unsigned char *lzma_buffer,size_t lzma_buffer_size,float point_value,size_t *bytes_read) {
    tick_data *result = 0;

    // decompress
    int status;
    unsigned char *buffer = n47::lzma::decompress(lzma_buffer,lzma_buffer_size,&status,bytes_read);

    //5-11-2020 here i will save binary file


    if (status != N47_E_OK) {
        bytes_read = 0;
    } else {
        // convert to tick data (with read_bin).
        result = read_bin(buffer,*bytes_read,point_value);
        delete [] buffer;
    }

    return result;
}

//5-11-2020
tick_data* read_bi5_to_bin(
    unsigned char *lzma_buffer,size_t *bytes_read,unsigned char** buffer_decompressed) {
    tick_data *result = 0;

    // decompress
    int status;
    *buffer_decompressed = n47::lzma::decompress(lzma_buffer,bytes_read);

    if (status != N47_E_OK) 
    {
        bytes_read = 0;
    }
    else {
        // convert to tick data (with read_bin).
        result = read_bin(*buffer_decompressed,point_value);
        //delete[] buffer;
    }

    return result;
}


tick_data* read(
        const char *filename,size_t *bytes_read) {
    tick_data *result = 0;
    size_t buffer_size = 0;
    unsigned char *buffer = n47::io::loadToBuffer<unsigned char>(filename,&buffer_size);

    if ( buffer != 0 ) {
        if ( n47::lzma::bufferIsLZMA(buffer,buffer_size) ) {
            result = read_bi5(buffer,bytes_read);
            // Reading in as bi5 Failed lets double check its not binary
            // data in the buffer.
            if (result == 0) {
                result = read_bin(buffer,point_value);
            }
        } else {
            result = read_bin(buffer,point_value);
            *bytes_read = buffer_size;
        }
        delete [] buffer;

        if (result != 0 && result->size() != (*bytes_read / n47::ROW_SIZE)) {
            delete result;
            result = 0;
        }
    }
    return result;
}

}  // namespace n47

解决方法

我一直深入到lzma工作的细节,这对我来说很重, 因此我更改了库,并使用了7z cpp lzma规格文件。
有效。
我认为问题与在cpp程序中使用c代码有关。
图书馆还说它已经过bsd测试 谢谢你的帮助。 任何具有相同情况的人都下载7zip并使用cpp文件

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。