微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

PostgreSQL启动过程中的那些事七:初始化共享内存和信号五:shmem中初始化multixact 编辑

pg初始化shmem,给其加上索引"ShmemIndex"后,接着就在shmem里初始化xlog。然后依次初始化clog、subtrans、twophase、multixact。安排按clog、subtrans、multixact、twophase的顺序写,把twophase放到multixact之后是因为前面三个用了相同的算法和数据结构,连起来写可以加深印象和归类记忆,本来想把初始化clog、subtrans、multixact放到一篇文章里写,因为篇幅太长还是分开了,看的时候这几篇文章可以结合起来看。

pg多事务日志管理器是一个类pg提交事务管理器,为每一个MultixactId存事务ID数组。它是共享行锁(shared-row-lock)实现的一个基础部分。一个共享锁锁住的元组把MultixactId存在自己的Xmax字段里,且一个事务需要等待元组被解锁后才能睡眠/再加锁于可能由多个事务ID组成的该MultixactId之上。

pg使用两套SLRU相关结构,一套存放偏移量,这个偏移量是在另一套SLRU相关结构里每一个MultiXactId数据的开始位置。这样的设计可以使我们保存变长事务ID数组。

和XLOG的关系:当一个新的偏移量或者成员页面被初始化为0时,Multixact模块产生一个XLOG记录,以及定义一个新的MultixactId时,也会产生一个XLOG记录。这样使pg可以在重做事务日志(XLOG replay)时完整重建进入的数据。因为这一点,pg不必遵循“在写数据前写WAL日志”的一般原则;只需要正确的保证在checkpoint完成之前我们把脏OFFSET和MEMBER页面(上面提到的两套SLRU相关结构的页面)刷出和同步到磁盘。在相应的WAL日志记录之前,如果一个页面做了,在使用该页面之前,这个页面肯定会被强制归0。因此,pg不需要用LSN信息标记内存页面;pg已经有了足够的同步。

像事务提交日志(CLOG)一样,但不像子事务(subtrans),pg必须保存跨越崩溃/崩溃恢复的状态且保证MultixactId和偏移量数字在跨越破溃/破溃恢复时单调增长。Pg用和事务ID同样的方式保证这一点:WAL日志记录保证包含每一个MXID的证据,我们不要担心这个,我们只需要确保在恢复时重放事务日志结束的时候,下一个MXID和下一个偏移量计数器至少是在重放日志中相应最大的就可以了。

上面概述了Multixact,下来我们看方法调用流程

1先上个图,看一下函数调用过程梗概,中间略过部分细节


初始化Multixact方法调用流程图

2初始化xlog相关结构

说main()->…->PostmasterMain()->…->reset_shared()-> CreateSharedMemoryAndSemaphores()->…-> MultixactShmemInit(),初始化Multixact事务相关数据结构MultixactOffsetCtl、MultixactMemberCtl、MultixactState等,用作内存里管理和缓存Multixact事务日志文件(存放在"data/pg_multixact/offsets"和"data/pg_multixact/members"文件夹里的文件)。

MultixactShmemInit ()->SimpleLruInit()->ShmemInitStruct(),在其中调用hash_search()在哈希表索引"ShmemIndex"中查找" MultixactOffset Ctl",如果没有,就在shmemIndex中给" MultixactOffset Ctl "分一个HashElement和ShmemIndexEnt(entry),在其中的Entry中写上"MultixactOffset Ctl"。返回ShmemInitStruct(),再调用ShmemAlloc()在共享内存上给" MultixactOffset Ctl"相关结构(见下面“Multixact相关结构图”)分配空间,设置entry(在这儿及ShmemIndexEnt类型变量)的成员location指向该空间,size成员记录该空间大小,最后返回MultixactShmemInit (),让SlruCtlData *类型全局变量MultixactOffsetCtl指向SlruCtlData 类型静态全局变量MultixactOffsetCtlData,MultixactOffsetCtlData的起始地址就是在shmem里给"MultixactOffset Ctl"相关结构分配的内存起始地址,设置其中SubTransCtlData结构类型的成员值。

接着MultixactShmemInit ()->SimpleLruInit()->ShmemInitStruct(),在其中调用hash_search()在哈希表索引"ShmemIndex"中查找"MultixactMember Ctl",如果没有,就在shmemIndex中给"MultixactMember Ctl "分一个HashElement和ShmemIndexEnt(entry),在其中的Entry中写上"MultixactMember Ctl"。返回ShmemInitStruct(),再调用ShmemAlloc()在共享内存上给"MultixactMember Ctl"相关结构(见下面“Multixact相关结构图”)分配空间,设置entry(在这儿及ShmemIndexEnt类型变量)的成员location指向该空间,size成员记录该空间大小,最后返回MultixactShmemInit (),让SlruCtlData *类型全局变量MultixactMemberCtl指向SlruCtlData 类型静态全局变量MultixactMemberCtlData,MultixactMemberCtlData的起始地址就是在shmem里给"MultixactMember Ctl"相关结构分配的内存起始地址,设置其中SubTransCtlData结构类型的成员值。

然后调用ShmemInitStruct(),在其中调用hash_search()在哈希表索引"ShmemIndex"中查找"Shared Multixact State",如果没有,就在shmemIndex中给" Shared Multixact State"分一个HashElement和ShmemIndexEnt(entry),在其中的Entry中写上"Shared Multixact State"。返回ShmemInitStruct(),再调用ShmemAlloc()在共享内存上给"Shared Multixact State"相关结构(见下面“Multixact相关结构图”)分配空间,设置entry(在这儿及ShmemIndexEnt类型变量)的成员location指向该空间,size成员记录该空间大小,最后返回MultixactShmemInit (),让MultixactStateData *类型全局静态变量MultixactState指向MultixactStateData结构实例,MultixactStateData的起始地址就是在shmem里给"Shared Multixact State"相关结构分配的内存起始地址,设置其中MultixactStateData结构类型的成员值。

相关变量、结构定义和初始化完成后数据结构图在下面。

static MT_LOCAL SlruCtlData MultixactOffsetCtlData;

static MT_LOCAL SlruCtlData MultixactMemberCtlData;

#define MultixactOffsetCtl (&MultixactOffsetCtlData)

#define MultixactMemberCtl (&MultixactMemberCtlData)

typedef struct SlruCtlData

{

Slrushared shared;

/*

* This flag tells whether to fsync writes(true for pg_clog,false for

* pg_subtrans).

*/

bool do_fsync;

/*

* Decide which of two page numbers is"older" for truncation purposes. We

* need to use comparison of TransactionIdshere in order to do the right

* thing with wraparound XID arithmetic.

*/

bool (*PagePrecedes)(int,int);

/*

* Dir is set during SimpleLruInit and does notchange thereafter. Since

* it's always the same,it doesn't need to bein shared memory.

*/

char Dir[64];

} SlruCtlData;

typedef SlruCtlData *SlruCtl;

/*

* Shared-memorystate

*/

typedef struct SlrusharedData

{

LWLockId ControlLock;

/* Number of buffers managed by this SLRU structure */

int num_slots;

/*

* Arrays holding info for each bufferslot. Page number is undefined

* when status is EMPTY,as is page_lru_count.

*/

char **page_buffer;

SlruPageStatus*page_status;

bool *page_dirty;

int *page_number;

int *page_lru_count;

LWLockId *buffer_locks;

/*----------

* We mark a page "most recentlyused" by setting

* page_lru_count[slotno]= ++cur_lru_count;

* The oldest page is therefore the one withthe highest value of

* cur_lru_count- page_lru_count[slotno]

* The counts will eventually wrap around,butthis calculation still

* works as long as no page's age exceedsINT_MAX counts.

*----------

*/

int cur_lru_count;

/*

* latest_page_number is the page number of thecurrent end of the log;

* this is not critical data,since we use itonly to avoid swapping out

* the latest page.

*/

int latest_page_number;

} SlrusharedData;

typedef SlrusharedData *Slrushared;

static MultixactStateData *MultixactState;

typedef structMultixactStateData

{

/* next-to-be-assigned MultixactId */

MultixactIdnextMXact;

/* next-to-be-assigned offset */

MultixactOffsetnextOffset;

/* the Offset SLRU area was last truncated at thisMultixactId */

MultixactIdlastTruncationPoint;

/*

* Per-backend data starts here. We have two arrays stored in the area

* immediately following the MultixactStateDatastruct. Each is indexed by

* BackendId.(Note: valid BackendIds run from 1 to MaxBackends; element

* zero of each array is never used.)

*

* OldestMemberMXactId[k] is the oldestMultixactId each backend's current

* transaction(s) Could possibly be a memberof,or InvalidMultixactId

* when the backend has no live transactionthat Could possibly be a

* member of a Multixact. Each backend sets its entry to the current

* nextMXact counter just before firstacquiring a shared lock in a given

* transaction,and clears it at transactionend. (This works because only

* during or after acquiring a shared lockCould an XID possibly become a

* member of a Multixact,and that Multixactwould have to be created

* during or after the lock acquisition.)

*

* OldestVisibleMXactId[k] is the oldestMultixactId each backend's

* current transaction(s) think is potentiallylive,or InvalidMultixactId

* when not in a transaction or not in atransaction that's paid any

* attention to Multixacts yet. This is computed when first needed in a

* given transaction,and cleared attransaction end. We can compute it

* as the minimum of the validOldestMemberMXactId[] entries at the time

* we compute it (using nextMXact if none arevalid). Each backend is

* required not to attempt to access any SLRUdata for MultixactIds older

* than its own OldestVisibleMXactId[] setting;this is necessary because

* the checkpointer Could truncate away suchdata at any instant.

*

* The checkpointer can compute the safetruncation point as the oldest

* valid value among all theOldestMemberMXactId[] and

* OldestVisibleMXactId[] entries,or nextMXactif none are valid.

* Clearly,it is not possible for anylater-computed OldestVisibleMXactId

* value to be older than this,and so there isno risk of truncating data

* that is still needed.

*/

MultixactIdperBackendXactIds[1]; /* VARIABLE LENGTH ARRAY */

} MultixactStateData;

下面看看初始化完"MultixactOffset Ctl"、"MultixactOffset Ctl"及"Shared Multixact State"相关结构后在内存中的结构图


初始化完Multixact相关结构的内存结构图

为了精简上图,把创建shmem的哈希表索引"ShmemIndex"时创建的HCTL结构删掉了,这个结构的作用是记录创建可扩展哈希表的相关信息。增加了左边灰色底的部分,描述共享内存/shmem里各变量物理布局概览,由下往上,由低地址到高地址。其中的"MultixactCtl"相关机构即MultixactOffsetCtl和MultixactMemberCtl的相关结构图下面分别给出,要不上面的图太大太复杂了。


Multixact相关结构图

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐