12
QueryProcessing
PracticeExercises
12.1
Assume(forsimplicityinthisexercise)thatonlyonetuplefitsinablockandmemoryholdsatmost3blocks.Showtherunscreatedoneachpassofthesort-mergealgorithm,whenappliedtosortthefollowingtuplesonthefirstattribute:(kangaroo,17),(wallaby,21),(emu,1),(wombat,13),(platypus,3),(lion,8),(warthog,4),(zebra,11),(meerkat,6),(hyena,9),(hornbill,2),(baboon,12).
Answer:Wewillrefertothetuples(kangaroo,17)through(baboon,12)usingtuplenumberst1throught12.Werefertothejthrunusedbytheithpass,asrij.Theinitialsortedrunshavethreeblockseach.Theyare:
r11r12r13r14
====
{t3,t1,t2}{t6,t5,t4}{t9,t7,t8}{t12,t11,t10}
Eachpassmergesthreeruns.Thereforetherunsaftertheendofthefirstpassare:
r21={t3,t1,t6,t9,t5,t2,t7,t4,t8}r22={t12,t11,t10}
Attheendofthesecondpass,thetuplesarecompletelysortedintoonerun:
r31={t12,t3,t11,t10,t1,t6,t9,t5,t2,t7,t4,t8}
12.2
ConsiderthebankdatabaseofFigure12.13,wheretheprimarykeysareunderlined,andthefollowingSQLquery:
1
2Chapter12QueryProcessing
selectT.branchnamefrombranchT,branchS
whereT.assets>S.assetsandS.branchcity=“Brooklyn”
Writeanefficientrelational-algebraexpressionthatisequivalenttothisquery.Justifyyourchoice.Answer:Query:
T.branchname((branchname,assets(T(branch)))1(T.assets>S.assets
assets((branchcity=’Brooklyn’)(S(branch)))))
Thisexpressionperformsthethetajoinonthesmallestamountofdatapossible.ItdoesthisbyrestrictingtherighthandsideoperandofthejointoonlythosebranchesinBrooklyn,andalsoeliminatingtheunneededattributesfromboththeoperands.12.3
Letrelationsr1(A,B,C)andr2(C,D,E)havethefollowingproperties:rhas20,000tuples,rofr12has45,000tuples,25tuples1fitononeblock,and30tuplesofrseeksrequired,2fitononeblock.Estimatethenumberofblocktransfersandusingeachofthefollowingjoinstrategiesforr11r2:a.Nested-loopjoin.b.Blocknested-loopjoin.c.Mergejoin.d.
Hashjoin.
Answer:
r1needs800blocks,andrmemory.IfM>800,the2needs1500blocks.LetusassumeMpagesofjoincaneasilybedonein1500+800diskaccesses,usingevenplainnested-loopjoin.SoweconsideronlythecasewhereM≤800pages.a.
Nested-loopjoin:
Usingr1astheouterrelationweneed20000∗1500+800=30,000,800diskaccesses,ifr=36,001,500disk2istheouterrelationweneed45000∗800+1500accesses.
b.
Blocknested-loopjoin:
Ifr1istheouterrelation,weneed⌈M800−1⌉∗1500+800diskaccesses,ifr2istheouterrelationweneed⌈1500M−1⌉∗800+1500diskaccesses.c.
Merge-join:
Assumingthatrtalsortingcostinclusive1andr2arenotinitiallysortedonthejoinkey,theto-oftheoutputisBs=1500(2⌈logM−1(1500/M)⌉+
Exercises3
2)+800(2⌈logM−1(800/M)⌉+2)diskaccesses.Assumingalltupleswiththesamevalueforthejoinattributesfitinmemory,thetotalcostisBs+1500+800diskaccesses.d.
Hash-join:
Weassumenooverflowoccurs.Sincerr1issmaller,weuseitasthebuildrelationand2astheproberelation.IfM>800/M,i.e.noneedforrecursivepartitioning,thenthecostis3(1500+800)=6900diskaccesses,elsethecostis2(1500+800)⌈logM−1(800)−1⌉+1500+800diskaccesses.
12.4
Theindexednested-loopjoinalgorithmdescribedinSection12.5.3canbeinefficientiftheindexisasecondaryindex,andtherearemultipletupleswiththesamevalueforthejoinattributes.Whyisitinefficient?Describeaway,usingsorting,toreducethecostofretrievingtuplesoftheinnerrelation.Underwhatconditionswouldthisalgorithmbemoreefficientthanhybridmergejoin?Answer:
Iftherearemultipletuplesintheinnerrelationwiththesamevalueforthejoinattributes,wemayhavetoaccessthatmanyblocksoftheinnerrelationforeachtupleoftheouterrelation.Thatiswhyitisinefficient.Toreducethiscostwecanperformajoinoftheouterrelationtupleswithjustthesecondaryindexleafentries,postponingtheinnerrelationtupleretrieval.Theresultfileobtainedisthensortedontheinnerrelationaddresses,allowinganefficientphysicalorderscantocompletethejoin.Hybridmerge–joinrequirestheouterrelationtobesorted.Theabovealgorithmdoesnothavethisrequirement,butforeachtupleintheouterrelationitneedstoperformanindexlookupontheinnerrelation.Iftheouterrelationismuchlargerthantheinnerrelation,thisindexlookupcostwillbelessthanthesortingcost,thusthisalgorithmwillbemoreefficient.
12.5
Letrandsberelationswithnoindices,andassumethattherelationsarenotsorted.Assuminginfinitememory,whatisthelowest-costway(intermsofI/Ooperations)tocomputer1s?Whatistheamountofmemoryrequiredforthisalgorithm?Answer:
Wecanstoretheentiresmallerrelationinmemory,readthelargerrelationblockbyblockandperformnestedloopjoinusingthelargeroneastheouterrelation.ThenumberofI/Ooperationsisequaltobrequirementismin(bpages.
r+bs,andmemoryr,bs)+212.6
ConsiderthebankdatabaseofFigure12.13,wheretheprimarykeysareunderlined.SupposethataB+-treeindexonbranchcityisavailableonrelationbranch,andthatnootherindexisavailable.Listdifferentwaystohandlethefollowingselectionsthatinvolvenegation:a.
¬(branchcity<“Brooklyn”)(branch)
4Chapter12QueryProcessing
b.¬(branchcity=“Brooklyn”)(branch)
c.¬(branchcity<“Brooklyn”∨assets<5000)(branch)
Answer:
a.
Usetheindextolocatethefirsttuplewhosebranchcityfieldhasvalue“Brooklyn”.Fromthistuple,followthepointerchainstilltheend,retrievingallthetuples.
b.
Forthisquery,theindexservesnopurpose.Wecanscanthefilesequentiallyandselectalltupleswhosebranchcityfieldisanythingotherthan“Brooklyn”.
c.
Thisqueryisequivalenttothequery
(branchcity≥′Brooklyn′∧assets<5000)(branch)
Usingthebranch-cityindex,wecanretrievealltupleswithbranch-cityvaluegreaterthanorequalto“Brooklyn”byfollowingthepointerchainsfromthefirst“Brooklyn”tuple.Wealsoapplytheadditionalcriteriaofassets<5000oneverytuple.
12.7
Writepseudocodeforaniteratorthatimplementsindexednested-loopjoin,wheretheouterrelationispipelined.Yourpseudocodemustdefinethestandarditeratorfunctionsopen(),next(),andclose().Showwhatstateinformationtheiteratormustmaintainbetweencalls.
Answer:Letouterbetheiteratorwhichreturnssuccessivetuplesfromthepipelinedouterrelation.Letinnerbetheiteratorwhichreturnssuc-cessivetuplesoftheinnerrelationhavingagivenvalueatthejoinat-tributes.Theinneriteratorreturnsthesetuplesbyperforminganindexlookup.ThefunctionsIndexedNLJoin::open,IndexedNLJoin::closeandIndexedNLJoin::nexttoimplementtheindexednested-loopjoiniteratoraregivenbelow.Thetwoiteratorsouterandinner,thevalueofthelastreadouterrelationtupletrandaflagdonerindicatingwhethertheendoftheouterrelationscanhasbeenreachedarethestateinformationwhichneedtoberememberedbyIndexedNLJoinbetweencalls.
IndexedNLJoin::open()begin
outer.open();inner.open();doner:=false;
if(outer.next()=false)
movetuplefromouter’soutputbuffertotr;else
doner:=true;
end
Exercises5
IndexedNLJoin::close()begin
outer.close();inner.close();end
booleanIndexedNLJoin::next()begin
while(¬donebegin
r)if(inner.next(tr[JoinAttrs])=false)begin
movetuplefrominner’soutputbuffertotcomputetandplaceitinoutputbuffer;s;returntruer;1tsendelse
if(outer.next()=false)begin
movetuplefromouter’soutputbuffertotinnertofirsttupleofs;r;rewindendelse
doner:=true;
end
returnfalse;end
12.8
Designsort-basedandhash-basedalgorithmsforcomputingtherelationaldivisionoperation(seePractiseExercisesofChapter6foradefinitionofthedivisionoperation).
Answer:Supposer(T∪S)ands(S)betworelationsandr÷shastobecomputed.
Forsortingbasedalgorithm,sortrelationsonS.Sortrelationron(T,S).Now,startscanningrandlookattheTattributevaluesofthefirsttuple.ScanrtilltupleshavesamevalueofT.AlsoscanssimultaneouslyandcheckwhethereverytupleofsalsooccursastheSattributeofr,inafashionsimilartomergejoin.Ifthisisthecase,outputthatvalueofTandproceedwiththenextvalueofT.Relationsmayhavetobescannedmultipletimesbutrwillonlybescannedonce.Totaldiskaccesses,after
6Chapter12QueryProcessing
sortingboththerelations,willbe|r|+N∗|s|,whereNisthenumberofdistinctvaluesofTinr.
WeassumethatforanyvalueofT,alltuplesinrwiththatTvaluefitinmemory,andconsiderthegeneralcaseattheend.PartitiontherelationronattributesinTsuchthateachpartitionfitsinmemory(alwayspossiblebecauseofourassumption).Considerpartitionsoneatatime.Buildahashtableonthetuples,atthesametimecollectingalldistinctTvaluesinaseparatehashtable.ForeachvalueofT,Now,foreachvalueVeachvaluesofS,probethehashtableon(VanyofthevaluesTofT,T,s).Ifisabsent,discardthevalueVInthecasethatnotallrtuplesT,elseoutputthevalueVwithonevalueforTfitT.
inmemory,partitionrandsontheSattributessuchthattheconditionissatisfied,runthealgorithmoneachcorrespondingpairofpartitionsrvaluesgeneratedineachpartition.iandsi.OutputtheintersectionoftheT12.9
Whatistheeffectonthecostofmergingrunsifthenumberofbufferblocksperrunisincreased,whilekeepingoverallmemoryavailableforbufferingrunsfixed?
Answer:Seekoverheadisreduced,butthethenumberofrunsthatcanbemergedinapassdecreasespotentiallyleadingtomorepasses.Avalueofbbthatminimizesoverallcostshouldbechosen.
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- igat.cn 版权所有 赣ICP备2024042791号-1
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务