我有一系列事物的数据文件.每件东西都有一个基因列在其中.这是一个多关系,因为每个基因都可以是多个事物的一部分,但每个事物只能有一个基因.
想象一下这些模型大致如下:
class Gene(db.Model): __tablename__ = "gene" id = db.Column(db.Integer,primary_key=True) name1 = db.Column(db.Integer,index=True,unique=True,nullable=False) # nullable might be not right name2 = db.Column(db.String(120),unique=True) things = db.relationship("Thing",back_populates='gene') def __init__(self,name1,name2=None): self.name1 = name1 self.name2 = name2 @classmethod def find_or_create(cls,name2=None): record = cls.query.filter_by(name1=name1).first() if record != None: if record.name2 == None and name2 != None: record.name2 = name2 else: record = cls(name1,name2) db.session.add(record) return record class Thing(db.Model): __tablename__ = "thing" id = db.Column(db.Integer,primary_key=True) gene_id = db.Column(db.Integer,db.ForeignKey("gene.id"),nullable=False,index=True) gene = db.relationship("Gene",back_populates='thing') data = db.Column(db.Integer)
我想批量插入很多东西,但我害怕通过使用
db.engine.execute(Thing.__table__.insert(),things)
我不会在数据库中拥有关系.是否有某种方法可以保留与批量添加的关系,或以某种方式顺序添加这些关系,然后在以后建立关系?所有the documentation about bulk adding似乎都假设您想要插入非常简单的模型,并且当您的模型更复杂时(例如上面的示例是一个愚蠢的版本),我有点迷失方向.
– 更新1 –
This answer似乎表明没有真正的解决方案.
This answer似乎证实了这一点.
解决方法
我实际上改变了我的代码,我认为现在更好,所以我也改变了我的答案.
我定义了以下2个表.集合和数据,对于集合中的每个集合,数据中有许多数据.
class Sets(sa_dec_base): __tablename__ = 'Sets' id = sa.Column(sa.Integer,primary_key=True) FileName = sa.Column(sa.String(250),nullable=False) Channel = sa.Column(sa.Integer,nullable=False) Loop = sa.Column(sa.Integer,nullable=False) Frequencies = sa.Column(sa.Integer,nullable=False) Date = sa.Column(sa.String(250),nullable=False) Time = sa.Column(sa.String(250),nullable=False) Instrument = sa.Column(sa.String(250),nullable=False) Set_Data = sa_orm.relationship('Data') Set_RTD_spectra = sa_orm.relationship('RTD_spectra') Set_RTD_info = sa.orm.relationship('RTD_info') __table_args__ = (sa.UniqueConstraint('FileName','Channel','Loop'),) class Data(sa_dec_base): __tablename__ = 'Data' id = sa.Column(sa.Integer,primary_key = True) Frequency = sa.Column(sa.Float,nullable=False) Magnitude = sa.Column(sa.Float,nullable=False) Phase = sa.Column(sa.Float,nullable=False) Set_ID = sa.Column(sa.Integer,sa.ForeignKey('Sets.id')) Data_Set = sa_orm.relationship('Sets',foreign_keys = [Set_ID])
然后,我将此函数写入bulk_insert数据与关系.
def insert_set_data(session,set2insert,data2insert,Data): """ Insert set and related data; with uniqueconstraint check on the set set2insert is the prepared set object without id. A correct and unique id will given by the db itself data2insert is a big pandas df,so that bulk_insert is used """ session.add(set2insert) try: session.flush() except sa.exc.IntegrityError as err: # here catch uniqueconstraint error if set already in db session.rollback() print('already inserted ',set2insert.FileName,'loop ',set2insert.Loop,'channel ',set2insert.Channel) pass else: # if not error,flush will give the id to the set ("Set.id") data2insert['Set_ID'] = set2insert.id # pass Set.id to data2insert as foreign_keys to keep relationship data2insert = data2insert.to_dict(orient = 'records') # convert df to record for bulk_insert session.bulk_insert_mappings(Data,data2insert) # bulk_insert session.commit() # commit only once,so that it is done only if set and data were correctly inserted print('inserting ',set2insert.Channel)
无论如何,可能还有其他更好的解决方案.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。