微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

sql-server – 将多个连接拆分为较小的SQL查询有帮助吗?

我们需要每晚在sql Server 2008 R2上进行一些报告.计算报告需要几个小时.为了缩短我们预先计算表格的时间.此表是基于JOINining 12相当大(数百万行)表创建的.

这个聚合表的计算花了几天前cca 4小时.我们的DBA将这个大连接分成3个较小的连接(每个连接4个表).临时结果每次都保存到临时表中,用于下一次连接.

DBA增强的结果是,聚合表在15分钟内计算.我想知道这怎么可能. DBA告诉我,这是因为服务器必须处理的数据量较小.换句话说,在大型原始连接中,服务器必须处理的数据多于总和较小的连接数.但是,我认为优化器会使用原始的大连接有效地完成它,自己拆分连接并仅发送下一个连接所需的列数.

他做的另一件事是他在一个临时表上创建了一个索引.但是,我再次认为优化器将根据需要创建适当的哈希表,并且更好地优化计算.

我和我们的DBA讨论了这个问题,但他自己也不确定是什么能够改善处理时间.他刚才提到,他不会责怪服务器,因为计算这么大的数据可能会让人不堪重负,并且优化器很可能很难预测最佳的执行计划…….我理解这一点,但我希望有更明确的答案.

所以,问题是:

>什么可能导致重大改善?
>将大型连接分成较小的是一个标准程序吗?
>在多个较小的连接的情况下,服务器必须处理的数据量是否真的更小?

这是原始查询

Insert Into FinalResult_Base
SELECT       
    TC.TestCampaignContainerId,TC.CategoryId As TestCampaignCategoryId,TC.Grade,TC.TestCampaignId,T.TestSetId,TL.TestId,TSK.CategoryId,TT.[TestletId],TL.SectionNo,TL.Difficulty,TestletName = Char(65+TL.SectionNo) + CONVERT(varchar(4),6 - TL.Difficulty),TQ.[QuestionId],TS.StudentId,TS.ClassId,RA.SubjectId,TQ.[QuestionPoints],GoodAnswer  = Case When TQ.[QuestionPoints] Is null Then 0
                      When TQ.[QuestionPoints] > 0 Then 1 
                      Else 0 End,WrongAnswer = Case When TQ.[QuestionPoints] = 0 Then 1 
                      When TQ.[QuestionPoints] Is null Then 1
                     Else 0 End,NoAnswer    = Case When TQ.[QuestionPoints] Is null Then 1 Else 0 End,TS.Redizo,TT.ViewCount,TT.SpentTime,TQ.[Position],RA.SpecialNeeds,[Version] = 1,TestAdaptationId = TA.Id,TaskId = TSK.TaskId,TaskPosition = TT.Position,QuestionRate = Q.Rate,TestQuestionId = TQ.Guid,AnswerType = TT.TestletAnswerTypeId
FROM 
    [TestQuestion] TQ WITH (NOLOCK)
    Join [TestTask] TT WITH (NOLOCK)            On TT.Guid = TQ.TestTaskId
    Join [Question] Q WITH (NOLOCK)         On TQ.QuestionId =  Q.QuestionId
    Join [Testlet] TL WITH (NOLOCK)         On TT.TestletId  = TL.Guid 
    Join [Test]     T WITH (NOLOCK)         On TL.TestId     =  T.Guid
    Join [TestSet] TS WITH (NOLOCK)         On T.TestSetId   = TS.Guid 
    Join [RoleAssignment] RA WITH (NOLOCK)  On TS.StudentId  = RA.PersonId And RA.RoleId = 1
    Join [Task] TSK WITH (NOLOCK)       On TSK.TaskId = TT.TaskId
    Join [Category] C WITH (NOLOCK)     On C.CategoryId = TSK.CategoryId
    Join [TimeWindow] TW WITH (NOLOCK)      On TW.Id = TS.TimeWindowId 
    Join [TestAdaptation] TA WITH (NOLOCK)  On TA.Id = TW.TestAdaptationId
    Join [TestCampaign] TC WITH (NOLOCK)        On TC.TestCampaignId = TA.TestCampaignId 
WHERE
    T.TestTypeId = 1    -- eliminuji ankety 
    And t.ProcessedOn is not null -- ne vsechny,jen dokoncene
    And TL.ShownOn is not null
    And TS.Redizo not in (999999999,111111119)
END;

DBA伟大的工作后新的分裂连接:

SELECT       
    TC.TestCampaignContainerId,6 - TL.Difficulty) -- prevod na A5,B4,B5 ...,[Version] = 1 -- ?,TL.Guid AS TLGuid,TS.TimeWindowId
INTO
    [#FinalResult_Base_1]
FROM 
    [TestSet] [TS] WITH (NOLOCK)
    JOIN [Test] [T] WITH (NOLOCK) 
        ON [T].[TestSetId] = [TS].[Guid] AND [TS].[Redizo] NOT IN (999999999,111111119) AND [T].[TestTypeId] = 1 AND [T].[ProcessedOn] IS NOT NULL
    JOIN [Testlet] [TL] WITH (NOLOCK)
        ON [TL].[TestId] = [T].[Guid] AND [TL].[ShownOn] IS NOT NULL
    JOIN [TimeWindow] [TW] WITH (NOLOCK)
        ON [TW].[Id] = [TS].[TimeWindowId] AND [TW].[IsActive] = 1
    JOIN [TestAdaptation] [TA] WITH (NOLOCK)
        ON [TA].[Id] = [TW].[TestAdaptationId] AND [TA].[IsActive] = 1
    JOIN [TestCampaign] [TC] WITH (NOLOCK)
        ON [TC].[TestCampaignId] = [TA].[TestCampaignId] AND [TC].[IsActive] = 1
    JOIN [TestCampaignContainer] [TCC] WITH (NOLOCK)
        ON [TCC].[TestCampaignContainerId] = [TC].[TestCampaignContainerId] AND [TCC].[IsActive] = 1
    ;

 SELECT       
    FR1.TestCampaignContainerId,FR1.TestCampaignCategoryId,FR1.Grade,FR1.TestCampaignId,FR1.TestSetId,FR1.TestId,TSK.CategoryId AS [TaskCategoryId],FR1.SectionNo,FR1.Difficulty,TestletName = Char(65+FR1.SectionNo) + CONVERT(varchar(4),6 - FR1.Difficulty) -- prevod na A5,FR1.StudentId,FR1.ClassId,FR1.Redizo,FR1.TestAdaptationId,AnswerType = TT.TestletAnswerTypeId,TT.Guid AS TTGuid

INTO
    [#FinalResult_Base_2]
FROM 
    #FinalResult_Base_1 FR1
    JOIN [TestTask] [TT] WITH (NOLOCK)
        ON [TT].[TestletId] = [FR1].[TLGuid] 
    JOIN [Task] [TSK] WITH (NOLOCK)
        ON [TSK].[TaskId] = [TT].[TaskId] AND [TSK].[IsActive] = 1
    JOIN [Category] [C] WITH (NOLOCK)
        ON [C].[CategoryId] = [TSK].[CategoryId]AND [C].[IsActive] = 1
    ;    

DROP TABLE [#FinalResult_Base_1]

CREATE NONCLUSTERED INDEX [#IX_FR_Student_Class]
ON [dbo].[#FinalResult_Base_2] ([StudentId],[ClassId])
INCLUDE ([TTGuid])

SELECT       
    FR2.TestCampaignContainerId,FR2.TestCampaignCategoryId,FR2.Grade,FR2.TestCampaignId,FR2.TestSetId,FR2.TestId,FR2.[TaskCategoryId],FR2.[TestletId],FR2.SectionNo,FR2.Difficulty,FR2.TestletName,FR2.StudentId,FR2.ClassId,TQ.[QuestionPoints] -- 1+ good,0 wrong,null no answer,GoodAnswer  = Case When TQ.[QuestionPoints] Is null Then 0
                      When TQ.[QuestionPoints] > 0 Then 1 -- cookie
                      Else 0 End,FR2.Redizo,FR2.ViewCount,FR2.SpentTime,TQ.[Position] AS [QuestionPosition],RA.SpecialNeeds -- identifikace SVP,FR2.TestAdaptationId,FR2.TaskId,FR2.TaskPosition,FR2.AnswerType
INTO
    [#FinalResult_Base]
FROM 
    [#FinalResult_Base_2] FR2
    JOIN [TestQuestion] [TQ] WITH (NOLOCK)
        ON [TQ].[TestTaskId] = [FR2].[TTGuid]
    JOIN [Question] [Q] WITH (NOLOCK)
        ON [Q].[QuestionId] = [TQ].[QuestionId] AND [Q].[IsActive] = 1

    JOIN [RoleAssignment] [RA] WITH (NOLOCK)
        ON [RA].[PersonId] = [FR2].[StudentId]
        AND [RA].[ClassId] = [FR2].[ClassId] AND [RA].[IsActive] = 1 AND [RA].[RoleId] = 1

    drop table #FinalResult_Base_2;

    truncate table [dbo].[FinalResult_Base];
    insert into [dbo].[FinalResult_Base] select * from #FinalResult_Base;

    drop table #FinalResult_Base;

解决方法

1减少“搜索空间”,加上更好的中间/后期联接统计数据.

我不得不处理90个表连接(米老鼠设计),其中查询处理器甚至拒绝创建计划.将这种连接分解为每个9个表的10个子连接,大大降低了每个连接的复杂性,每个连接都会随着每个附加表呈指数级增长.此外,查询优化工具现在将它们视为10个计划,总体上花费(可能)更多时间(Paul White甚至可能有指标!).

中间结果表现在将具有他们自己的新统计数据,因此与深度树的统计数据相比更好地加入,这些数据库早期变得倾斜并且很快就会成为科幻小说.

此外,您可以先强制选择性最强的连接,减少向上移动树的数据量.如果您可以比Optimiser更好地估计谓词的选择性,为什么不强制使用连接顺序.可能值得寻找“浓密的计划”.

2我认为应该考虑效率和绩效是否重要

3不一定,但如果最早选择性连接是在早期执行的话

原文地址:https://www.jb51.cc/mssql/79357.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐