Boost 序列化：在不破坏兼容性的情况下添加新的 register_type

如何解决Boost 序列化：在不破坏兼容性的情况下添加新的 register_type

我使用 Boost 用这样的代码序列化一个 NeuralNetwork 对象

template <class Archive>
void NeuralNetwork::serialize(Archive& ar,unsigned version)
{
    boost::serialization::void_cast_register<NeuralNetwork,StatisticAnalysis>();
    ar & boost::serialization::base_object<StatisticAnalysis>(*this);
    ar.template register_type<FullyConnected>(); // derived from Layer object
    ar.template register_type<Recurrence>();
    ar.template register_type<Convolution>();
    ar.template register_type<MaxPooling>();
    ar & layers; // vector<unique_ptr<Layer>>
}

我的问题是我已经序列化了对象，当我添加一个从 Layer 继承的新类时，出现以下错误：unkNown file: error: C++ exception with description "unregistered class" thrown in the test body.

如何添加新的 register_type<T> 不破坏与已经序列化和保存的对象的兼容性？

解决方法

当我添加一个从 Layer 继承的新类时，出现以下错误：unknown file: error: C++ exception with description "unregistered class" throw in the test body.

我认为这是由于其他原因造成的。

参考点：“自动”类型注册

典型的模式是不使用 register_type。相反，您将使用自动注册机制：https://www.boost.org/doc/libs/1_32_0/libs/serialization/doc/special.html#registration

版本 1 Live Demo
版本 2 Live Demo (-DVERSION2)

#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/base_object.hpp>
#include <boost/serialization/export.hpp>
#include <boost/serialization/unique_ptr.hpp>
#include <boost/serialization/access.hpp>
#include <boost/serialization/vector.hpp>
#include <sstream>
#include <iostream>
#include <iomanip>
#include <boost/core/demangle.hpp>
using boost::serialization::base_object;
using boost::core::demangle;

struct StatisticAnalysis {
    virtual ~StatisticAnalysis() = default;
    virtual void report(std::ostream&) const = 0;
    std::vector<int> base_data {1,2,3};
    void serialize(auto& ar,unsigned) { ar & base_data; }

    friend std::ostream& operator<<(std::ostream& os,StatisticAnalysis const& sa) {
        sa.report(os);
        return os;
    }
};

BOOST_SERIALIZATION_ASSUME_ABSTRACT(StatisticAnalysis)
BOOST_CLASS_EXPORT(StatisticAnalysis)

struct Layer {
    virtual ~Layer() = default;
    void serialize(auto&,unsigned) { }
};

BOOST_SERIALIZATION_ASSUME_ABSTRACT(Layer)
BOOST_CLASS_EXPORT(Layer)

struct FullyConnected : Layer { void serialize(auto &ar,unsigned) { ar &base_object<Layer>(*this); } };
struct Recurrence     : Layer { void serialize(auto &ar,unsigned) { ar &base_object<Layer>(*this); } };
struct Convolution    : Layer { void serialize(auto &ar,unsigned) { ar &base_object<Layer>(*this); } };
struct MaxPooling     : Layer { void serialize(auto &ar,unsigned) { ar &base_object<Layer>(*this); } };

BOOST_CLASS_EXPORT(FullyConnected)
BOOST_CLASS_EXPORT(Recurrence)
BOOST_CLASS_EXPORT(Convolution)
BOOST_CLASS_EXPORT(MaxPooling)

#if defined(VERSION2)
struct NewLayer : Layer {
    void serialize(auto &ar,unsigned) { ar &base_object<Layer>(*this); }
};
BOOST_CLASS_EXPORT(NewLayer)
#endif

struct NeuralNetwork : StatisticAnalysis {
    virtual void report(std::ostream& os) const override {
        os << layers.size() << " layers: {";
        for (auto& layer : layers) {
            os << " " << demangle(typeid(*layer).name());
        }
        os << " }\n";
    }

    std::vector<std::unique_ptr<Layer> > layers;

    void serialize(auto& ar,unsigned) {
        ar &base_object<StatisticAnalysis>(*this);
        ar &layers;
    }
};

BOOST_CLASS_EXPORT(NeuralNetwork)

int main()
{
    std::unique_ptr<StatisticAnalysis> analysis;
    std::stringstream ss;
    {
        boost::archive::text_oarchive oa(ss);
        analysis = [] {
            auto nn = std::make_unique<NeuralNetwork>();
            nn->layers.emplace_back(std::make_unique<FullyConnected>());
            nn->layers.emplace_back( std::make_unique<Recurrence>());
            nn->layers.emplace_back(std::make_unique<Convolution>());
            nn->layers.emplace_back(std::make_unique<FullyConnected>());
            nn->layers.emplace_back(std::make_unique<FullyConnected>());
            nn->layers.emplace_back(std::make_unique<MaxPooling>());
            return nn;
        }();
        oa << analysis;
    }

    std::cout << "Data: " << std::quoted(ss.str()) << "\n";

    {
        boost::archive::text_iarchive ia(ss);

        analysis.reset();
        ia >> analysis;
        
        std::cerr << *analysis << "\n";
    }

}

两个版本都有相同的存档：

Data: "22 serialization::archive 17 0 0 1 13 NeuralNetwork 1 0
0 0 0 3 0 1 2 3 0 0 6 0 0 0 7 14 FullyConnected 1 0
1 1 0
2 8 10 Recurrence 1 0
3
4 9 11 Convolution 1 0
5
6 7
7
8 7
9
10 10 10 MaxPooling 1 0
11
12
"
6 layers: { FullyConnected Recurrence Convolution FullyConnected FullyConnected MaxPooling }

与 register_type 比较

只要确保 register_type 实际上不会造成兼容性问题 - 因为文档可能是 implying indeed：

请注意，如果序列化功能在保存和加载之间拆分，则两个功能都必须包含注册。这是保持同步保存和相应加载所必需的。

注意：在看到输出与文本档案的预期相同后，我还修改了写入二进制档案，以防万一有一些实现差异。

Live Demo（同时包含 v1 和 v2 版本）：

//#define VERSION2
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#include <boost/serialization/base_object.hpp>
#include <boost/serialization/export.hpp>
#include <boost/serialization/unique_ptr.hpp>
#include <boost/serialization/access.hpp>
#include <boost/serialization/vector.hpp>
#include <fstream>
#include <iostream>
#include <iomanip>
#include <boost/core/demangle.hpp>
using boost::serialization::base_object;
using boost::core::demangle;

struct StatisticAnalysis {
    virtual ~StatisticAnalysis() = default;
    virtual void report(std::ostream&) const = 0;
    std::vector<int> base_data {1,unsigned) { ar &base_object<Layer>(*this); } };

//BOOST_CLASS_EXPORT(FullyConnected)
//BOOST_CLASS_EXPORT(Recurrence)
//BOOST_CLASS_EXPORT(Convolution)
//BOOST_CLASS_EXPORT(MaxPooling)

#if defined(VERSION2)
struct NewLayer : Layer {
    void serialize(auto &ar,unsigned) { ar &base_object<Layer>(*this); }
};
//BOOST_CLASS_EXPORT(NewLayer)
#endif

struct NeuralNetwork : StatisticAnalysis {
    virtual void report(std::ostream& os) const override {
        os << layers.size() << " layers: {";
        for (auto& layer : layers) {
            os << " " << demangle(typeid(*layer).name());
        }
        os << " }\n";
    }

    std::vector<std::unique_ptr<Layer> > layers;

    void serialize(auto& ar,unsigned) {
        ar &base_object<StatisticAnalysis>(*this);
        ar.template register_type<FullyConnected>(); // derived from Layer object
        ar.template register_type<Recurrence>();
        ar.template register_type<Convolution>();
        ar.template register_type<MaxPooling>();
#if defined(VERSION2)
        ar.template register_type<NewLayer>();
#endif

        ar &layers;
    }
};

BOOST_CLASS_EXPORT(NeuralNetwork)

int main(int,char **argv) {
    std::string program_name(*argv);

    std::unique_ptr<StatisticAnalysis> analysis;
    {
        std::ofstream ofs(program_name + ".bin",std::ios::binary);
        boost::archive::binary_oarchive oa(ofs);
        analysis = [] {
            auto nn = std::make_unique<NeuralNetwork>();
            nn->layers.emplace_back(std::make_unique<FullyConnected>());
            nn->layers.emplace_back( std::make_unique<Recurrence>());
            nn->layers.emplace_back(std::make_unique<Convolution>());
            nn->layers.emplace_back(std::make_unique<FullyConnected>());
            nn->layers.emplace_back(std::make_unique<FullyConnected>());
            nn->layers.emplace_back(std::make_unique<MaxPooling>());
            return nn;
        }();
        oa << analysis;
    }

    {
        std::ifstream ifs(program_name + ".bin",std::ios::binary);
        boost::archive::binary_iarchive ia(ifs);

        analysis.reset();
        ia >> analysis;
        
        std::cerr << *analysis << "\n";
    }
}

测试命令

g++ -std=c++20 -Os -DVERSION1 -lboost_serialization main.cpp -o v1
g++ -std=c++20 -Os -DVERSION2 -lboost_serialization main.cpp -o v2
./v1 && ./v2 && md5sum v1.bin v2.bin

成功完成，写入相同的存档 v1.bin 和 v2.bin，如它们的 md5sum 所示：

5bba3ef7d8a25bd50d0768fed5dfed64  v1.bin
5bba3ef7d8a25bd50d0768fed5dfed64  v2.bin

总结 - 从这里开始

我认为原则上添加子类不应该破坏存档兼容性。如果看起来确实如此，

从以上两个选项中寻找实现方式的不同
寻找其他同时发生的变化
- 使用 git-bisect 之类的东西来检查仍然可以反序列化的最后一个“工作”版本，隔离破坏性更改
- 检查例如boost 库版本兼容
- 确保您没有使用不可移植的档案（特别是 binary_[io]archive is not portable；另见 boost text deserialization crashing on 32bit windows machine）

如果您想了解更多信息，我会在这里。如果问题变得足够不同，请考虑开一个新问题。