您好,欢迎来到爱go旅游网。
搜索
您的当前位置:首页Efficient Algorithms and Architectures for High-speed Text Processing and Data Correlation

Efficient Algorithms and Architectures for High-speed Text Processing and Data Correlation

来源:爱go旅游网
EfficientAlgorithmsandArchitecturesforHigh-speedTextProcessingandDataCorrelation

APhDDissertationProposal

submittedby

DepartmentofEE-Systems

UniversityofSouthernCalifornia,LosAngeles,CAUSA

zbaker@halcyon.usc.edu

ZacharyBaker

ViktorK.Prasanna(Chairman)

PeterA.BeerelKaiHwang

ShrikanthNarayanan

MichaelS.Waterman(OutsideMember)

GuidanceCommittee

Monday,Dec6th,2004

1-3pmEEB539

Abstract

Weproposetodevelopalgorithmsandarchitecturesforenergy,time,andareaefficienthardwareimplementationsofvariouscomputationanddataintensivekernels,namely,stringmatchingandcorrelationmining.Wewilldemonstrateourarchitecturesinthecontextofnetworksecurity,butthekernelshavewideapplicationsinnotonlynetworksecurity,butcomputationalbiology,wiretaptextmining,frauddetection,etc.Throughtheuseofreconfigurablehardwarewecanachievesignificantperformanceimprovementsoversoftwareapproaches,and,throughintelligentalgorithmdesignwedemonstrateimprovementsintime,area,andenergy-efficiencyoverna¨ıvehardwareapproaches.Inthisproposalwewilldescribethegeneralapplicationareas,giveanoverviewofourcompletedresearch,andintroducethreepotentialresearchtracks.

Contents

1Introduction

2Background

2.1FPGA............................................2.2TextMatching.......

................................2.2.1Unalignedmatching................................2.2.2AlignedMatching.................................2.3DataMining........................................2.4NetworkSecurity......................................3CompletedResearch

3.1StringMatching.....

.................................3.1.1KMP........................................3.1.2Shift-and-Compare.................................3.2DataMining.......

.................................3.2.1IntroductiontotheAprioriAlgorithm......................3.2.2OurContributions

...........

......................

4ProposedResearch

4.1Low-risk,MediumReward:AcceleratingDataMiningKernels.............4.2Medium-riskEngineering:AlignedTextMining.....................4.3High-risk,HighReward:AutomaticAttackDetection.................5ResearchPlan6OverallThesisGoals

1

233334456667101011131314151617

1Introduction

Computersystemsecurityisprogressivelybecomingalargerandmorepressingconcerntotheworld.AsmoresectorsofoureconomybecomedependentontheInternet,theybecomeincreasinglyexposedtoattacksfromoutside.Thevarietyofattacksisonlymatchedbytheirpotentialtargetswithinthesystem.OneofthemoresimpleattacksisaDenialofServiceattack,whichfloodsanetworkwithsomanyrequeststhatitisimpossibletoprovideservicetothelegitimaterequests.Morecomplicatedattacksactuallyaddresstheweaknessesofparticularapplicationsrunningonaserver.Theseattacks,mostcommonlyimplementedthroughbufferoverflowandotherwiseunacceptableinputs,exploitpoorlydesignedsoftwaretotakecontrolofacomputer.Aftercontrolisgained,thecompromisedcomputercanbeusedinadistributedattackagainstothercomputers,searchacomputerforpasswords,personalinformation,creditcardnumbers,etc.Often,theseapplication-levelattacksaredistinguishedbyaparticularfingerprintstringthatiscommontoallvariationsoftheattack.

Detectingtheseattacksreducestoastring-matchingproblem,ahighlycomparison-intensiveproblemtoattemptinreal-time.Whilethekernelstringmatchingoperationisasimpleandwell-understoodproblem,theparticularrequirementsofnetworksecuritymakeitaninterestingresearchproblem.Thekeycharacteristicisconcurrentmatchingagainstthousandsofpatternsatlineratesofupto10Gb/s.Unfortunately,thisisahighlycomputationallyintensiveproblem,andreal-timeperformanceisnecessarytobothpreventbufferoverflowsandprovidetimelyattackinformation.Itisimpossibletomeettheperformancerequirementswithoutspecialhardware.Forinstance,aDual1GHzPentiumIIIsystem,using845patterns,runsatonly50Mbps.Thehighlevelofparallelismandthroughputrequiredtransformasimplesequentialproblemintoaninterestingstudyofarchitecturaldetail,graph-basedoptimizationtechniques,andhardwareutilization.

Forthisreason,itisnecessarytomovetohardwareimplementationstoincreasethelevelofparallelismpossible.FPGAdevicesprovideasurprisinglypowerfulandconvenientplatformforimplementingouralgorithms.Forthesekerneldomains,FPGAsprovidemanyoftheadvantagesofASICdesign,includingpotentialforparallelism,veryefficientbit-leveloperations,andhighbandwidth.UnlikeASICs,though,reconfigurablehardwarecanbereprogrammedtosuitaspecificsetofinputdataoroperatingsituations,allowingtheflexibilityofsoftwarewiththepoweroffixedhardware.

Theotheralgorithmareaweaddress,correlationmining,hasapplicationsinbusinessdatapro-cessing,biosequencemining,andwehaverecentlywrittenawhitepaperdealingwiththeprocessingofspeech-to-textdataforwiretapmining.Miningisadataintensiveproblem,withlargedatabasesspanningGigabytes.Ourobjectiveistodeveloparchitecturesallowingforreal-timeanalysisofdatawindows,requiringsignificantlyhigherperformancethaniscurrentlyavailable.

AproblemofthestringmatchingapproachtoIntrusionDetectionhasalwaysbeentheadditionofnewpatterns;theyaredeterminedmanuallybyinspectingnewattacksforidentifying“fingerprints.”Ouroverallgoalistoleverageourhigh-performancekernelsinasystemthatcandetermine,basedonitscurrentdatabaseandthetemporallocalityofattackingpackets,thatanewattackiscontainedwithinanotherwiseinnocent-appearingpacket.

WefirstgiveanintroductiontocurrentFPGAtechnologiesandtheresourcesandperformanceavailablewiththem,andthengiveanoverviewofthetextmatchinganddataminingalgorithmsweutilize.InSection3,wedescribetheworkcompletedthusfar.InSection4,ourproposedworkisdiscussed,alongwithsomeoftheproblemswefaceinachievingourgoals.

2

2

2.1

Background

FPGA

FieldProgrammableGateArrays(FPGA)provideafabricuponwhichapplicationscanbebuilt.FPGAs,inparticular,SRAMbasedFPGAsfromXilinx[40]orAltera[4]arebasedonalook-uptables,flip-flops,andmultiplexers.Inthesedevices,aSRAMbankservesasaconfigurationmemorythatcontrolsallofthefunctionalityofthedevice,fromthelogicimplementedtothesignalingstandardsoftheIOpins.Thevaluesinthelook-uptablescanproduceanycombinationallogicfunctionalitynecessary,theflip-flopsprovideintegratedstateelements,andtheSRAM-controlledroutingdirectlogicvaluesintotheappropriatepathstoproducethedesiredarchitecture.Thedeviceiscomposedofmanythousandsofbasiclogiccellsthatincludethebasiclogicelements,andbasedonthedevicevariety,includesfastASICmultipliers,ethernetMACs,localRAMs,andclockmanagers.FPGAsstartedoutasprototypingdevices,allowingforconvenientdevelopmentofglue-logic-typeapplicationsforconnectingASICcomponentswithouthighVLSIdesigncostsorlargenumbersofdiscretestandardlogicgates.AsthegatedensityofFGPAdevicesincreasedandapplicationspecificASICblockswereadded,theapplicationsshiftedfromgluelogictoawidevarietyofsolutionsforsignalprocessingandnetworkproblems,andthedeviceshavebeendeployedinthefieldasthefinal,butstillflexible,solution.BecausethedeviceiscontrolledbythestateoftheSRAMbits,thefunctionalitycanbechangedbychangingthememorystate.Thiscanbeuseful,aslogiccanbecustomizedforaparticularsetofinputdata.

FPGAscanprovideanefficientsolutionformanyproblems.Reconfigurablelogichasbecomeaverypopularapproachfornetworkapplications.Cryptography[42],error-tolerantcodeprocess-ing[25],routerapplicationssuchaspacketscheduling[22]andflowmonitoring[27],andvariousstrategiesforintrusiondetectionallhaveFPGAimplementationsforacceleratingcomputationalintensiveportionsoftheircode.However,byteorientedoperationssuchasstringmatchingisoneparticularstrengthofFPGAfabrics.Intheearliestworkinregularexpressionmatching[30,31],amethodwaspresentedformatchingregularexpressionsusingaNon-deterministicFiniteAutomaton,implementedonanFPGA.

Webelievethatintelligentarchitecturaldesignisworthmorethanasimpleimplementation.Therearemanytoolsforautomaticallyconvertingahigh-levellanguageprogramintoalow-leveldesign.Withrecentadvancesincompilerandsynthesistechnology,itisnowpossibletomapthecomputationallyintensivemodulesofaprograminFortran,C(SRCCarte[37]andCeloxica[12]),orJava(XilinxForge[41])tohardware.Duetothedevelopmentofthesehigh-leveldesigntools[4,40],thescientificcomputingcommunitycanutilizereconfigurabledeviceswithoutsteeplearningcurves.

However,whiletheeaseandpopularityofusingFPGAsforapplicationdesignhasincreased,theautomaticallygeneratedarchitecturestendtobeinefficientcomparedtowell-researchedandthought-outarchitecturesforcomplexapplications.Inthesesituations,thecreativityanddomainknowledgerelevanttoadesignprovidedbyahumandesignerisavaluableasset.

2.2TextMatching

Textmatchingisafundamentalkernelofmanyapplications.Webreakthetextmatchingdisciplineintotwomainsections;unalignedandalignedmatching.Thesecategoriescorrespondroughlytoourcurrentandproposedworkinstringmatchingfornetworksecurityandtextmining,respectively.2.2.1

Unalignedmatching

Unalignedmatchingisachallengingproblembecauseitisimpossibletopredeterminewhereapatternwillstartinaninputstream.Thus,inthena¨ıveapproach,thetimecomplexityisO(nk)foraklengthpatterninannlengthinputstream.Thena¨ıveapproachassumesaworstcaseinwhichthe

3

Figure1:Devicecapabilitiesvs.time

inputpatternmatchesforthefirstk−1charactersandthenfailsinthelastcharacter;thesystemthenadvancesthebeginningcomparisonpointertothenextcharacterandproceedestomatchtothek−1patterncharacteragain.Anexamplesituationofthissortisaaaaaaabwheretheinputstreamsconsistsonlyofa.

Thisisvitaltoourarchitectureaswewillseelater,butitisimportanttonetworksecurityingeneralbecauseanattackermightattempttocausetheIDStodroppacketsbyfloodingitwithspeciallydesignedpackets.IfaIDSisoverwhelmedbytraffic,mostconfigurationswillallowsomepacketsthroughwithoutscreening,orsubjectthemonlytoacursoryexamination.Thisprovidesanopportunityforanattackertosubvertthesystem.Thistypeofattackisshowntobeuselessagainstourdesigninourpaper([8]).2.2.2

AlignedMatching

Alignedmatchingisadifferentsituationaltogether.Becauseeachwordisdelimitedbythespacebetweenwords,itispossibletotakeamuchmoreparallelapproachtotheproblem.Insteadoftestingonealignmentofthestreamineachcycle,itispossibletoincreasethroughputbyacceptingmultiplewordsfromthestreamaswellasallowingconcurrentevaluationofpatterns.Onestrategyistoaserialshift-inbufferwithaparallelreadout;thisallowsawholewordtobeshiftedinandthenreadoutandcomparedinasinglestep.Becauseweknowthatthematchisbasedonawholewordinsteadofasomearbitrarysectionoftext,wecanacceptcharactersandfasterandreduceunnecessarywork.

Thisisrelevantbecausetherearemanyalignedmatchingproblems.Whiletheunalignedproblemisrelevanttonetworksecuritymatching,alignedmatchinghasapplicationstotextmining,websearching,indexing,speech-to-textapplications,etc.Asweareinterestedindevelopingourworkinthetextminingfield,weareinterestedinexploringthealignedmatchingproblem.

2.3DataMining

Recentadvancesinstorageanddatasensinghasrevolutionizedourtechnologicalcapabilityforcollectingandstoringdata.Serverlogsforpopularwebsites,customertransactiondatafromnetworkrouters,creditcardpurchases,customerloyaltycards,etc.produceterabytesofdatainthespanofadaythatisusefulashistoricalrecordbutnotasusefulasitcouldbewereiteffectivelyprocessedforpatternsandtrends.Correlation-baseddataminingisthefieldofalgorithmstoprocessthisdataintomoreusefulforms,inparticular,connectionsbetweensetsofitems.Wehaveinvestigated

4

theApriorialgorithm[2],apopularstrategydesignedforprogressivelygroupingtogetherfrequentitemsetsinlargedatabasesgivenaparticularfrequencycutoff.

WhilethecomputationanddatacomplexityoftheApriorialgorithmisveryhigh,littleresearchhasbeendoneinefficientimplementationsforhardwareacceleration.Wefeelthatmuchofthedis-interestindoingthisworkliesinthechallengeofimplementingsetmembershipfunctionsefficiently,aswellasinthecomplexityofcontrol.

Thedevicechoiceisimportant,asthesystolicarrayisdependentonitsdatasourceandsink,andforthe“softwaredeceleration”[19],providedbyamicroprocessorforcontrol-intensiveoperationsthattakeupasmallfractionofthetotalrunningtimeofthealgorithm.

K-means

K-meansclusteringisadataminingstrategythatgroupstogetherelementsbasedonadistancemeasure.ThedistancecanbeanactualmeasureofEuclideandistanceorcanbemappedfromanymannerofotherdatatypes.Eachiteminasetisrandomlyassignedtoacluster,thecentersoftheclustersarecomputed,andthenelementsareaddedandremovedfromclusterstomoreefficientlymovethemclosertothecentersoftheclusters.In[16,39]thek-meansclusteringalgorithminimplementedasanexampleofaspecialreconfigurablefabricintheformofacellulararrayconnectedtoahostprocessor.

K-meansisrelatedtotheApriorialgorithmasbotharedependentonefficientsetadditionsandcomputationsperformedonallelementsofthosesets,butaddsthedistancecomputationandsignificantlychangeshowthesetsarebuiltup.Besidesdifferingintheoverallalgorithm,thestructureofthecomputationisalsosignificantlydifferent,asthesystemrequirestheuseofglobalmemory,inwhicheachunit’spersonalmemoryisaccessiblebythehostcontroller.Byavoidingglobalconnectionsthatviolatetheprinciplesofsystolicdesign,wecanincreaseoverallsystemclockfrequencyandeaseroutingproblems.

2.4NetworkSecurity

Methodscommonlyusedtoprotectagainstsecuritybreachesincludefirewallswithfilteringmecha-nismstoscreenoutobviouslydangerouspackets,andintrusiondetectionsystemswhichusemuchmoresophisticatedrulesandpatternmatchingtosensepotentialmaliciouspackets.Thesetechniquesrequiresignificantcomputationalresources,and,usinghighly-paralleladaptivesoftprocessors,pro-videopportunitiesfordramaticimprovementsinperformance.

Afirewall’sfunctionistofilterattheheaderlevel;ifaconnectionisattemptedtoadisallowedport,suchasFTP,theconnectionisrefused.Thiscatchesmanyobviousattacks,butinordertodetectmoresubtleattacks,anIntrusionDetectionSystem(IDS)isutilized.TheIDSdiffersfromafirewallinthatitgoesbeyondtheheader,actuallysearchingtheapplicationlayerstreamcontentsforvariouspatternsthatimplyanattackistakingplace,orthatsomedisallowedcontentisbeingtransferredacrossthenetwork.CurrentIDSpatterndatabasesreachintothethousandsofpatterns,providingforadifficultcomputationaltask.

BecausetheIDSmustinspectatthelinerateofitsdataconnection,IDSpatternmatchingdemandsexceptionallyhighperformance.Thisperformanceisdependentontheabilitytomatchagainstalargesetofpatterns,andthustheabilitytoautomaticallyoptimizeandsynthesizelargedesignsisvitaltoafunctionalnetworksecuritysolution.Muchworkhasbeendoneinthefieldofstringmatchingfornetworksecurity[14,17,18,26,35].However,thestudyoftheautomaticdesignofefficient,flexible,andpowerfulsystemarchitecturesisstillinitsinfancy.

AcompleteIntrusionDetectionSystems(IDS)basedontheSnortrules[34]requiresasystemoptimizedforthousandsofrules,manyofwhichrequirestringmatchingagainsttheentiredatasegmentofapacket.Highlevelsofperformancearenecessarytoprovidereal-timestringmatchingattrunklinespeeds.Snort,theopen-sourceIDS[34]hasthousandsofcontent-basedrules.Eachoftheserulesrequirethatapacketbesearchedinitsentiretyfortheoccurrenceofsome“fingerprint”string.Usingna¨ıvemethods,thisisunworkable.

5

3

3.1

3.1.1

CompletedResearch

StringMatching

KMP

IntrusionDetectionSystemsrelyonthepatternmatchingofthousandsoffingerprintstringstodetectpotentialmaliciouspackets.Thisfilteringrequiressignificantcomputationalresourcesandisdifficultusingna¨ıvemethods.

Ournovelapproachtoruntimeadaptabilityusesapipelined,two-comparator,bufferedimple-mentationoftheKnuth-Morris-Prattalgorithm(KMP)[21]toimplementhigh-performancepatternmatching,whileyieldingaunitdesignthatishighlyareaefficient.

OverviewofKMP

KMP,developedbyKnuth,Morris,andPratt[21]utilizesapre-computedtabletopreventredundantcomparisons,reducingtheworst-caserunningtimefromO(kn)toO(n+k).Thepre-computedtable,orπ-table1,containsthelengthofthelargestnumberofcharactersintheprefixofthepatternPthatmatchthesuffix:

π[q]=max{j:jP[1...j−1]=P[q−j+1...q−1]}

(1)

Thatis,theπ-tabletellshowmuchofthebeginningofthepatternhasalreadybeenmatchedatanypositioninthepattern.Ofcourse,thisonlyaffectsthesystemwhenthepatternhasrepeatedstrings;apatternsuchas“abcdefg”wouldhavenousefulinformationintheπ-table,while“abaab”wouldhaveusefulinformationbecausethebeginningofthepatternshowsuplaterinthepattern.Aftertheπ[q]definitionisapplied,thereissomepost-processingontheπ-table,discussedlaterinthissection.

Theπ-tablebelowisfortheworst-casepattern,aFibonaccistring[21].π[1]=0,meaningthereisnopossiblematchandtheinputpointershouldbeadvanced.π[4]=2,meaningthenextcharactercomparisonisagainstP[2]=‘b’.Ifthisfails,thenextcomparisonisagainst‘a’.Thesecondcomparisonof‘a’isunavoidablebecauseKMPishistoryless,meaningthatthesystemcannotrememberwhatithascomparedearlier.Animportantpointisthathard-codedFiniteAutomata-basedimplementationscanworkmoreefficientlythanamemory-basedapproachbecausetheFAcancompareallofthepossibletransitionsinparallel,whiletheRAM-basedapproachmustsequentiallycomparetheincomingcharacteragainst(potentially)severalpatterncharacters.

qP[q]π[q]1a02b13a04a25b16a07b48a09a2OneimportantnoteisthatwhenP(i)=P(π(i)),theoptimizationπ(i)<=π(π(i))canbemade.ThisviolatesEquation1butprovidesmuchhigherperformancebyremovingcomparisonsthatcanneverbetrue.Thisoptimizationwasintroducedintheoriginalpaper[21]butwasomittedin[38].

OurapproachtointrusiondetectionusesamodifiedversionoftheKMPalgorithmandmatchingarchitectureoptimizedforrunningonalineararray.TheKnuth-Morris-Prattalgorithm(KMP)[21]isasophisticatedapproachtostringmatching,providingO(n+k)intheworstcase.Theexactrunningworstcaserunningtimeis2n−k.

OurContributions

Ourarchitectureenableshigh-throughput,easilyconfigurableintrusiondetectionforavarietyofhardwareplatforms(FPGAorASIC).Unlikeotherstate-of-the-artarchitectures,ourapproach

usetheπ-tableasequivalenttotheusageofthenextfunctionin[21];[38]alwayscomparesagainstπ+1

whereastheoriginalKMPcomparesagainstπ

1We

6

doesnotusehard-wiredcircuitrytoimplementpatternmatchingonanFPGAorotherwise,butusesconfigurablememoriesstoringpatternsandprecompiledjumptablestoprovideexceptionalflexibility.

Byminimizingthenumberofcomparatorsrequiredtomatchanewinputcharactereachcycle,weuselesshardwareresourcesthanotherapproaches.Afterfindingthemaximumbuffersizerequirementsthroughcarefulanalysisofthealgorithm,wecanproduceapatternmatchingunitthatusesfewFPGAresources,allowingmoreunitstobeintegratedontoasinglechip.Byallowingonlyone-waycommunicationbetweenneighboringunits,thearchitectureissuited,bydesign,foruseinalineararrayandingridsofunits.

Asignificantcontributionofthispaperisademonstrationofthemaximumsizebufferrequiredtoimplementastringmatchercapableofacceptingacharacterfromtheinputineachcyclewithoutresortingtokparallelcomparators,wherekisthepatternsize,orannelementbuffer,wherenistheinputstreamsize.ThearchitecturethatenablesthisisabufferedstringmatchingsystemimplementingaKMP-likepre-computationalgorithmutilizingtwocomparators.Thisallowsa√

matchingunittoacceptonecharactereachcycleintoabufferofsizelogφk,whereφ=1+25.Bybufferingtheinputweallowalineararrayofpatternmatchingunitstobedaisy-chainedtogether.Theunit-levelbufferingisakeyidea,becausewithoutbufferingwheneveranindividualunitstallstorecoverfromafailingcomparison,theentiresystemmuststallaswell.Thisisunacceptable.ThelineararrayweenablereducesthefanoutandtheinterconnectdistancebyallowingunitstoberegularlyarrangedonanFPGAwithoutlonginterconnects.Thedetailsofourarchitecturearecoveredin[8].

Whilethenetworkintrusiondetectionproblemiswelldefined,theperformancemeasuresarenot.Therearemanyapproachestotheintrusiondetectionproblem,andeachhasitsownstrengthsandweaknesses.Weassumethatthemostimportantmeasuresarethroughput,areaefficiency,andtotalworkperformed.Throughput,becausethefilteringneedstokeeppacewiththelinerate.Areaefficienyisimportantbecausethereareafewthousandpatternsthatneedtobematched.Weproposethatreconfigurabilityisthethirdvitalmetric:thepatterndatabasedoeschangefromtimetotime,andthedowntimeandprocessingtimeforreconfigurationshouldbelow.Hardwiredapproaches[5,6,7,18,36]requireplaceandrouteoftheentiredevice,andthenadelayforbitstreamreconfiguration.Theidealapproachoffersonlinereconfiguration,withlittleornodelaytothedatastream.

Wehavedefinedperformanceintermsofoverallworkperformed,andpatterncharactersperlogiccells.Shift-and-compareapproaches,asdiscussedinSection3.1.2below,requirethattheentirepatternbecomparedagainsttheinputstringineverycycle.Thisequalstheworst-casesituationinthena¨ıveapproach,nkcomparisons.ThroughamortizedanalysisweshowthattheKMParchitecturerequiresonly2nbytecomparisons,amuchmoreefficientstrategythantheshift-and-compareapproach.Intermsofmeasuredbypatterncharactersperlogiccell,theKMPoutperformsallbutthehardwiredapproaches;inordertoincreasethelengthofthepatternstring,allthatisrequiredisenoughmemorytostorethepatternandRAMsufficienttostore1/2lengthofthepatternforthebuffer.Becausethesizeofthecontrollogicandthenumberofcomparatorsdoesnotincreasewiththesizeofthepattern(exceptingcountersizes,increasingaslg(n)ofthepatternsize),theKMPapproachisveryefficient.3.1.2

Shift-and-Compare

In[5,6,7],wedevelopedanautomaticsynthesismethodologyandtoolsetforhighlyareaandtime-efficientintrusiondetectionsystemsusingahigh-level,graph-basedpartitioningmethodology.Automatedsystemleveldesignallowsmoreefficientcommunicationandextensivereuseofhardwarecomponentsfordramaticincreasesinarea-timeperformance.Throughpre-processing,thistool-basedmethodologyyieldsdesignswithcompetitiveclockfrequenciesthataremoreareaefficientthananyothershift-and-comparearchitectures.

7

Previousapproachestostringmatching,excepting[15]havecenteredaroundabyte-levelviewofcharacters.Highperformancedesignsevenincreasedthebasecomparisonsizeto32bits,providinghighthroughputbyprocessingfourcharacterspercycle.Increasingthenumberofbitsprocessedatasinglecomparatorunitincreasesthefan-intosinglegates.Ourapproachmovesintheoppositedirection,tosingle-bit,orunary,comparisons.Wedecodeanincomingcharacterintoa“one-hot”bitvector,inwhichacharactermapstosinglebit.Thisearlydecodingisreferredtoas“shareddecoding”in[15].

Unfortunately,withoutsomereductioninthecharacterset,unaryrepresentationsarealmostentirelyuselessduetothelevelofinefficiencycausedbythehugenumberofbitlinesrequiredforthe256characterASCIIset.However,ifthecharactersetcanbereduced,thenumberofbitlinescanbedramaticallyreduced.ThemosttrivialexampleofreducedsetsisDNAmatching,wheretheonlycharactersrelevantareA,T,C,G,representedasfourone-hotbits.Amoreinterestingexampleisstringmatchingfornetworksecurity,wherethousandsofpatternsneedtobematchedsimultaneouslyathighthroughputrates.

Becauseintrusiondetectionrequiresamixofcasesensitiveandinsensitivealphabeticcharacters,numbers,punctuation,andhexadecimal-specifiedbytes,thereisaninterestinglevelofcomplexity.However,eachstringonlycontainsafewdozencharacters,andthosecharacterstendtorepeatacrossstrings.Usingtechniquesfromgraphtheory,thepatternsarepartitionedn-wayssuchthatthenumberofrepeatedcharacterswithinapartitionismaximized,whilethenumberofcharactersrepeatedbetweenpartitionsisminimized,thesystemcanbecomposedofnpipelines,eachwithaminimumofbitlines.

Thefirstandobviousproblemisthenumberofcharactersthatmightneedtobematchedforanyinputpattern.Dataisencodedforareason,afterall,andhavingabitforeverycharacterisnotefficient.Theroutingofalargeamountofbitlinesisfairlyexpensive,butbecauseofthepipeliningbetweenunits,thereislittletimeperformancepenalty.Ifthedesignautomationtoolscanpartitionarulesetto30bits,orroughlythesameamountofdataroutingasin[35],andthenmaketheactualmatchingunitsatleast8xsmaller(32downto4bitsfor[14]),andremovethebuffersandcontrollogicofKMP-styledesignsuchas[18],wecanexpecttoseeperformance(area-time)increases.

InthewholeoftheSnortdatabase,thereareonlyabout100differentcharactersevermatchedagainst.Someofthosearecaseinsensitive,orcanbemadecaseinsensitivewithoutlossofgenerality,reducingthenumberofuniquecharacterstoroughly75.Hexadecimal-specifiednon-characterbytesareexpandedtoequivalentcharacterrepresentations.Thenextstepistopartitionthepatternsintoseveralgroupssuchthattheminimumnumberofcharactershavetobepipedthroughthecircuit;thatis,wegiveeachgroupofpatternsapipeline,andgothroughvariousheuristictrickstoattempttoreducethepipelineregisterwidth.

Thegraphcreationstrategyisasfollows.Westartwithacollectionofpatterns,representedasnodesofagraph.Eachpatterniscomposedofcharacters.Everynodewithagivenletterisconnectedbyanedgetoeveryothernodewiththatletter.

Thisproducesadenselyconnectedgraph,almost40,000edgesina361vertexgraph.Ourobjectiveistopartitionthebiggraphintotwoormoresmallergroupssuchthatthenumberofedgesbetweennodeswithinthegroupismaximized,andthenumberofedgesbetweennodesindifferentgroupsisminimized.Eachpipelinesuppliesdataforasinglegroup.Bymaximizingtheedgesinternaltoagroupandminimizingedgesoutsidethegroup(thatmustbeduplicatedbetweenpartitions),wereducethewidthofthepipelineregisters,butimprovetheusageofanygivencharacterwithinthepipeline.

Thisproblemisastandardgraphtheoryprobleminphysicaldesignautomationknownas“min-cut”problem.Asinglecircuitneedstobesplitintotwochips,butthetwosectionsareinterdepen-dent.Themincutsolutionminimizesthenumberofwiresconnectingthetwochips,thusdecreasingthenumberofpinsandreducingenergyconsumptionandclockperiod.Weuseaniterativeim-provementheuristicbyKernighanandLin[20](1970).

Akeycontributionofthispaperisthedramaticallyreducedcomplexityofthemincutoperation

8

Design

USC(nopipelining)USC(pipelined)USCUnaryUSCUnary(1byte)USCUnary(4byte)USCUnary(8byte)USCUnary(Prefilter)USCUnary(Tree)LosAlamos(FPL’03)[17]UCLA(FPL’02)[14]

UCLAw/Reuse(FCCM’04)[13]

U/Crete(FPL’03)[35]U/Crete(FCCM’04)[36]GATech(FCCM’04)[15]Throughput1.8Gb/s2.4Gb/s2.07Gb/s1.79Gb/s6.1Gb/s10.3Gb/s6.4Gb/s2.00Gb/s2.2Gb/s2.88Gb/s3.2Gb/s10.8Gb/s9.7Gb/s7.0Gb/sUnitSize911207.35.722.332.09.46.624316011.42695750Performance

19.8202833152713226823039.118.028040.1170140

Table1:Unitsizeistheaverageunitsizefora16characterpattern(inlogiccells;onesliceistwologiccells),andperformance(inMb/s/cell).Throughputisassumedtobeconstantovervariationsinpatternsize.

bypartitioningnotnetlists,butthesetofpatterns.Becausetheoperationisatamuchhigherlevel,atthelayerofthepatternsandcharactersimilaritiesratherthangatesandwires,thesystemnotonlyrequiresmuchlesspruningbythesynthesistool,butallowsformucheasierpre-synthesisperformanceestimation.

Partitioningreducesthenumberofcharacterswithinanysinglepartition,whichreducestheinterconnectdensitywithinthatpartition.Wehavedemonstratedanaverageimprovementintimeperformanceof20%foralldatabasesizes.Unfortunately,becauseasthenumberofpartitionsincreasesthenumberofreplicatedcharactersalsoincreases,multiplepartitionscancauseanincreaseinoverallareausage.However,thesystemasawholewilloperatemorequicklybecauseeachpipelineisindependent.

Wehavealsoexploredtheuseofatreestructureforreducestheamountofcharactersredundantlycomparedbetweenmultiplepatterns.Eachpattern(oflengthgreaterthaneightcharacters,asotherwisethewholepatternwouldfitinthetwoprefixes)iscomposedofafourbytefirst-levelprefix,afourbytesecond-levelprefix,andtheremainderofthepattern.Eachprefixismatchedindependentlyoftheremainderofthepattern.Thisiseffectiveinreducingtheareaofthedesignbecauselargesectionsoftherulesetsshareprefixes.Themostcommonprefixis/scripts,wherethefirstandsecond-levelprefixesareusedtogether.The4-characterprefixwasdeterminedtofiteasilyintotheVirtex-style4-bitlookuptable,butitturnsoutthatfour-charactergroupsarehighlyrelevanttointrusiondetectionaswell.Patternswithdirectorynamessuchas/cgi-binand/cgi-wincansharethesamefirst-levelprefix,andtheneachhaveafewdozenpatternsthatsharethe-binor-winsecond-levelprefix.Redundantprefixcomparisonscanbereducedby2to3xthroughtheuseofthetreearchitecture.

Overall,thehardwiredsystemoperatesatmuchareaefficienciesthattheKMPsystem,packingupto5patterncharactersinasingleslice.Thehardwiringoflogicallowsforveryhighpatterndensity,fortworeasons.First,thelookup-tablearchitectureallowsforarbitrarylogicspecification,allowingveryefficientcomparators,whereasamoreflexiblesystemwouldrequireXORgatesandex-trastorageforthepattern,reducingefficiencybymorethan2xevenwithouttheotheroptimizations.Second,becausethesystemiscomposedwithknowledgeofthepatternstobematched,inter-patternredundancycanbeexploited,whereastheKMPcanonlyexploitintra-patternredundancy.

9

3.2DataMining

Dataminingisarapidlyadvancingfieldofstrategiesforfindingconnectionsbetweenelementsinlargedatabases.TheApriorialgorithm[23]isapopularandfoundationalmemberofthecorrelation-baseddataminingkernelsusedtoday.However,itisacomputationallyexpensivealgorithmandrunningtimescanstretchtodaysforlargedatabases,asdatabasesizescanreachfromGigabytesandcomputationrequiresmultiplepasses.3.2.1

IntroductiontotheAprioriAlgorithm

WebreaktheApriori[2]algorithmintothreesections,asillustratedinFigure2.Initialfrequentitemsetsarefedintothesystem,andthecandidategeneration,candidatepruning,andcandidatesupportstepsareexecutedinturn.Thesupportinformationisfedbackintothecandidategeneratorandthecyclecontinuesuntilthefinalcandidatesetisdetermined.Wewillfirstintroducesomeofthedatamininglexiconandthendescribetheoperationalphasesinmoredetail.

Intheliterature,ananalogytoashoppingcartisused:thesetofitemspurchasedatonetime,checkedoutfromthelibrary,orotherwisegroupedtogetherbasedonsomecriterasuchastime,customer,etc.isreferredtoasabasket.Theitemswithinthebasketcanbetheentiretransaction,ortheremaybemultipletransactionswithinthebasket.Afrequentitemsetistheasetofoneormoreitemsthatoftenoccurinadatabaseoneitem,andoftenoccurstogetherinthesamebasketwithinthedatabaseifitconsistsofmorethanoneitem.Thecutoffofhowoftenasetmustoccurbeforeitisincludedinthecandidatesetisthesupport.

Inthisway,aresearchercanrequestaparticularsupportvalueandfindtheitemswhichoccurtogetherinabasketaminimumnumberoftimeswithinthedatabase,guaranteeingaminimumconfidenceintheresults.Apopularexampleintheliterature(possiblyapocryphal)isprocessingthesupermarkettransactionsofworkingmenwithyoungchildren:whentheygotothestoreafterworktopickupdiapers,theytendtopurchasebeeratthesametime.Thus,itmakessensestatistically,ifnotsociallyresponsibly,toputabeerrefrigeratorinthediaperaisle.

Figure2:Processflowofthedataminingsystem

Candidategenerationistheprocessinwhichonegenerationofcandidatesarebuiltintothenextgeneration.ThisbuildingprocessisfromwheretheApriorinamederives.Becauseeachnewcandidateisbuiltfromcandidatesthathavebeendeterminedapriori(inthepreviousgeneration)tohaveahighlevelofsupport,theycanbeconfidentlyexpandedintonewpotentialfrequentitemsets.Thisisexpressedformallyasfollows:∀f1,f2∈Fkdo

withf1=(i1,...,ik−1,ik)andf2=(i1,...,ik−1,i∗k)

andikf:=f1∪f2=(i1,...,ik−1,ik,i∗k)

Itshouldbenotedthatonlyorderedsetsareutilized.Thus,whenfisgeneratedfromf1andf2,thesetsremainordered.Candidategenerationpairsupanycandidatesthatdifferonlyintheirfinalelementtogeneratethenextcandidategeneration.

10

Thenextstepofcandidategenerationguaranteesthateachnewcandidateisnotonlyformedfromtwocandidatesfromthepreviousgeneration,butthateverysubsetofitisalsopresentinthepreviousgeneration,asfollows:∀i∈f:f−{i}∈Fk

Thus,ourinitialcandidategenerationprovesbydesignthatifweremoveeitherofthelasttwoitems(ik,i∗k)fromthenewcandidate,wewillgetcandidatesfromthepreviousgeneration,namely,f1andf2.Thesecondstepprovesthatifweremoveanyoftheotheritemsfromthenewcandidate,wemustfindacandidatefromthepreviousgeneration.Thisprogressivebuild-upofcandidatesistheheartoftheApriorialgorithm.

Thethirdphaseofthealgorithmisthesupportcalculation.Itisbyfarthemosttimeconsuminganddataintensivepartoftheapplication,asitisduringthisphasethedatabaseisstreamedintothesystem.Eachpotentialcandidate’ssupport,ornumberofoccurrencesoverthedatabaseset,isdeterminedbycomparingeachcandidatewitheachtransactioninthedatabase.Ifthesetofitemsthatmakeupthecandidateappearinthetransaction,thesupportcountforthatcandidateisincremented,asfollows:∀t∈Tdo

∀c∈Cdo

ifc∈t

support(c)++

ThemainproblemwiththeApriorialgorithmisthisdatacomplexity.Eachcandidatemustbecomparedagainsteverytransactiondata,andcandidategenerationmustseetheentiredatabasetransactionset.Thisgivesalargerunningtimeforasinglegeneration,O(|T||C||t|),assumingthesubsetfunctioncanbeimplementedinconstanttime|t|.However,theparallelismcontainedintheloopsallowsforsomeinterestingaccelerationinhardware,particularlywhenimplementedasasystolicarray.3.2.2

OurContributions

Throughtheuseofanewtypeofsystolicarrayarchitecture,thetimerequiredforprocessingcanbedramaticallyreduced.OurarrayarchitectureimplementationonaXilinxVirtex-IIPro100providesa4xperformanceimprovementoverthefastestsoftwareimplementationavailablerunningonadual2.8GHzXeonsystemwith3GBRAM.

Duetoourstreamingimplementation,thecandidategenerationphaseofthealgorithmrequireordersofmagnitudelesstimethanthesupportcalculation,andoverallrequiresroughly25%ofthetimerequiredbythefastestnon-supercomputerimplementation[11].Theoff-chipmemoryrequiredisnegligiblebeyondthesizeofthedatabaseandthebandwidthbetweenmemoryandthesystolicarrayisonly250MB/s.

WhilethearchitecturecouldeasilybeimplementedinacustomASIC–infact,thesimpleunitsthatmakeupthesystolicarrayaredesignedexplicitlyforeaseofASICimplementation–theuseofFPGAallowstheusertoutilizeparameterizeddesignswhichallowforvariablesizeitemdescriptorsaswellasoptimizedmemorysizesforaparticularproblem.Aswell,FPGAsallowthedesigntobescaledupwardeasilyasprocesstechnologyallowsforever-largergatecounts.

Weaddresstheseissuesthroughhybridsystolicarray-microcontrolleddatapathsandefficientdesignprinciples.ThispaperpresentsseveralstrategieswehavedevelopedforadaptingtheApriorialgorithmtouseinasystolicarray[23].Wealsopresentastrategycalled“systolicinjection,”asignificanttechnicalcontributionstothegeneraluseofsystolicarraysthatallowsawiderangeofpreviouslychallengingapplicationstobeimplementedefficientlythroughanefficientstalling

11

procedure(Figure3).Themethodalsoforreportingunpredictablygeneratedmid-arrayresultstoacontrollerwithoutchanceofcollisionorexcessivestalling.

Figure3illustratesthedatamovementthroughthepipelineandthedatastoragebehaviorwithinaunit.

Figure3:SystolicinjectionoccursinStage3(cycle1),andStage2(cycle2)butdoesnotcauseunrecoverablecollisions

Throughtheuseofthesystolicarrayweallowforincreasedfrequencyperformance,decreasedinterconnect,andsimple,easilyscalablepackagedunits.Weimplementalldataandcomputationintensiveoperationswithinthesystolicarrayandimplementallserialandcontrolintensiveopera-tionswithinamicroprocessor,namelythePowerPC405integratedwiththeVirtex-IIProdevice.Theoff-chipbandwidthrequiredis250MB/s,allowingforpracticalimplementationwithoff-the-shelfmemorycontrollers.Theoff-chipmemoryrequirements,beyondthedatabase,arenegligible.

Becausethesystolicarrayimplementationdoesnotrequirereconfiguration,modifyingthesetsismuchfaster,asintheKMPstringmatchingsystolicarray,thereconfigurationtimeismk,thatis,theproductofthenumberofsetstobemodifiedandthesizeoftheset.Thisisveryfast,butincreasesthesizeofthesubsetunitoverwhatwouldberequiredinahardwiredimplemenation.Becauseperformanceofthesystolicarrayislargelydeterminedbyproductofthenumberofunitsandtheelementsthatcanbeacceptedineachcycle,aninterestingoptimizationproblemisencountered:

12

becausethereislittleareaconsumptionbeyondthesubsetarray,increasingthenumberofelementsacceptedineachcyclethoughelement-levelparallelismleadstoanequaldecreaseinthenumberofunitsonadevice.Weaddresstheseproblemsanddefinenewapplicationsforthesealgorithmsinthenextsection.

4ProposedResearch

AllworkinSection4.1istowardthedevelopmentofgeneralarchitecturesuitablefortheuseintheapplicationsdescribedinSections4.2and4.3.LikethebasicIntrusionDetectionSystem,eachapplicationprovidesaparticularsetofproblemdefinitions,parameters,andperformancemetricstooptimize.

4.1Low-risk,MediumReward:AcceleratingDataMiningKernels

Whilewehavemadesomeinterestingadvancesinthefieldofdatamining,inparticulartheimple-mentation[9]ofa4xperformanceimprovementoftheApriorialgorithm[2],webelievethereisstillmuchroominthefieldforresearchcontributions.First,ourpreviouswork,whilecontributingsomeinterestingarchitecturalenhancementstothefieldofsystolicarrays,didnotachievetheperformanceimprovementsthatmakehardwareimplementationssignificantlyworthwhile.Thisismostlyduetothenatureofthedataminingprobleminthestandardapproach;itrequiresseveralpassesofalargedataset,wherethecomputationalneedsarenotdense,butdifficulttoeasilyparallelize.

Oneofthekeyweaknessesofthecurrentimplementationistheoneitem-per-clockbandwidthlimitation.Thatis,ineachcycle,onlyoneelementofthedatabasecanbeprocessedduetothemergesort-likesubsetoperationcurrentlyimplemented.Wearecurrentlyinvestigatingalternateimplementationsofthefunctionalitythatallowformultipleelementstobeacceptedinacyclethroughhashingofinputdataandsmallbuffers.Thesestrategiesdonotaddresstherealproblemoflowcomputedensity,though,andwearesearchingforstrategiestoincreasethedensityoftheproblem.Oneapproach,introducedin[3]istore-encodecommonitemsetsassingleitemstoallowthereductionofthetotalnumberofelements.Ifitispossibletoextendthistoahardware-levelre-encodingandhardwiring,similartotheapproachwedevelopedforstringmatchingin[5],largeperformancegainscouldberealized.

Thereareseveralproblemswithextractingalargeamountofparallelismandredundancyre-ductionthatmustbeaddressed.ThefirstisaninabilitytotakeadvantageofthereconfigurableaspectoftheFPGA.TheIntrustionDetectionproblemisanidealoneforFPGAbecausethepatternsetsareassumedtonotchangemorethanafewtimespermonth.Thisallowsthepatternstobehard-codedintothedevicearchtecture.Thisislessusefulfordatamining.AmainfunctionalityoftheApriorialgorithmisindeterminingifasmallsetofelementsareasubsetoflargersetsofelements(transactions)overthousandstomillionsofdatabaserecords.Weassumetheelementstohave16bitidentifiers;thispreventsusfromefficientlyusingreencodingtechniquesasusedinthestringmatchingproblem.Itisinefficienttogenerateandloadnewbitsreamstomatchforasingleinstanceofaset.TheVirtex-IIPro100systolicarrayimplementationrequiresroughly24hourstoplace-and-route,removingtheoptionofchangingconfigurationevery15secondsforthe100,000transactionbenchmarkingset.

Wearecurrentlyinvestigatingtwooptions:increasedthroughhash-basedtechniques,andmemory-basedre-encoding.Hash-based

Increasingelement-levelparallelismpasttheone-elementpercyclebarrierofthecurrentsystolicarrayarchitectureisnecessarytoincreaseoverallthroughputofthesystem.Becausethecontentsofatransactionareunpredictable,itisimpossibletohaveacounter-controlledmultiplexerinflexiblyassignelementstoseparatecomparatorsub-systems.However,withahash-basedsystem,onecandemultiplextheincomingsetelementsbasedontheelementID,allowingforapredictable

13

parallelization(predictableinthissensemeansthatweknowbeforehandwhichelementswillbehandledbyagivencomparatorsubsystem).Whiletheelement-wiseresponsibilityispredictable,theloadingisnotasnothingisguaranteedaboutthecontentsofagiventransaction;thatis,thehashingofelement-IDscan,intheworst-case,directanentiretransactionintoasinglecomparator.Ofcourse,arandomizedhashingalgorithmguaranteesthattheaveragecasewillbeperfectlydistributedacrossthefieldofcomparatorsub-systems.However,inpractice,buffersandsynchronizationwillberequiredtohandleworst-casesituations.Assumingthattheparallelizationsystemexistswithinthelargersystolicarrayarchitecture,ifabufferofsomearbitrarysizedoesbecomefull,thesystemasawholecanbestalledusingthepre-existingsystolicinjectionhardware,althoughthisreducesaveragethroughput.

Memory-based

Inthisarchitecture,theperformanceistiedtothedetailsofthedevice.WeconsidertheXilinxVirtex-IIPro100device,apopulardeviceusedforresearchsimulations.Thisdevicehas55018kbtwo-portblockRAMmemories.Inordertofittheentireitemsetintoasinglememory,weconstraintheelementIDsizeto14bits,using16kbofablock.TheRAMisconfiguredasa16,384elementby1bitarray.Eachcandidateitemsetisconfiguredastheelementsofamemory,wheretheaddressofanelementistheelementIDandthepresenceofanitemisrepresentedbyasetbit.Supportisdeterminedbycountingthenumberofbitsmatched.Afterprogramming,writeenableisdeassertedandthedatabaseisstreamedpastthememoryblocks.Now,theelementIDsfromthedatabaseareutilizedasaddresses;amatchisthentheoutputofthememory.Thesubsetoperationissimplyasummationoftheoutputbits;ifthesumisequaltothesizeofthecandidateitemset,thesubsetiffoundtobecomplete.

Thisisasimpleenoughtransformationfromthemergestylearchitecture,butprovidesthepotentialforseveralsubstantiveincreasesinperformance.First,thememoryhardwareisbuiltintothedevice,soitdoesnothaveanassociatedcostinlogiccells.Next,theblockRAMisatwo-portmemory,meaningthatwegeta2xincreaseinbandwidthforfree.Third,thesummationoperationoverln(m)bits,wheremisthesizeofthecandidateitemset,isafastoperation.BecausetheRAMprovidesitsoutputatthebeginningofacycle,pipeliningoftheitemcomparisonsisnecessarybutshouldnotpresentproblems.GiventhattheblockRAMhasaonecyclepipelinedelay,theremaybesomeNOPcyclesinsertedaftertransactionstoallowforsummationtocompletebutoveralltheclockfrequencyshouldbehigher,aswellasallowingtwoelementspercycle.Asmentionedabove,theinversetradeoffswhenweattempttoincreasebandwidthbyusinglargernumbersofportsresultinaproportionalreductioninthenumberofcandidateitemsetsabletoconcurrentlyexecute.

4.2Medium-riskEngineering:AlignedTextMining

Textminingisanewareathatbuildsonourexperienceintextmatchinganddatamining.Thefocusofthisworkisonprocessingtextualinformationovermultiplechannelsathighspeeds(10Gbit/sandup).Theobjectivesofthistextsurveyapplication[28]includeclusteringmessagesbysimilarity(withouttraining)accordingtogeographicinformation(eitheraccordingtocontentandtopicswithinthemessage,ortosendingorreceivingmetadata),selectionofmessagesbelongingtouserselectedclusters.Theselectionandclusteringalgorithmsshouldbeabletoadapttochangesintopicsofinterestandbeabletoquicklyrescanhistoricalrecordstofindmessagesthatmatchthecurrenttopicofinterest.

Oneoftheresearchdirectionsweareinterestedinpursuingisthedevelopmentofanarchitecturalsuiteforprocessingtextintoencodeditemsets,efficientfiltering,andcorrelationanalysis.ThefilteringstepofaneffectivetextprocessingsystemcangreatlybenefitfromthereconfigurableaspectofFPGAs.Overtime,commonwordsbecomemeaningless,andrelevantkeywordschangeastimepasses;moreover,anidealfiltermustbeabletodetectcodedcommunicationandnotrejectaconversationbecauseisnotovertlyrelatedtothetopicofinterest.Moreover,itmustabletodothisathighdatarates.Wewilldeterminethemosteffectivestrategiestoprovidethebenefitsof

14

hardwareaccelerationandguaranteetheadaptiveflexibilityrequiredforconstantlychangingtopicsofinterestandcustomerobjectives.

Utilizingourcurrentworkincorrelation-miningalgorithmsasabasis,wewilldevelopnewkeyword-baseddistanceformulationsandweightingmeasuresforautonomousclusteringstrategies,aswellasitemsetfilteringtoreducefalsecorrelations.Oneofthegoalsofthisproposedworkistoprovidereconfigurablehardwareimplementationsfastenoughtoenableonlinedatabaseprocessing,allowingfor“keyworddriftanalysis,”thatis,allowingpreviouslyunseentopicstoberecognizedandgivennewcorrelationcategories,aswellasremovingoldcategoriestobeprunedfromthedatabase.

Giventhedynamickeywordanalysisdevelopedfortextfiltering,wecanfurthercorrelatemes-sagesacrosstimetofindmessagesinthepastthatrelatetocurrentlyrelevantkeywordsandtopics.Thekeyelementofthisanalysisisfastsetanalysisandtheabilitytohandlehighinputdatarates.

Overall,thisresearchtrackisfeasibleandcanbecompletedinthenextyear.

4.3High-risk,HighReward:AutomaticAttackDetection

Runtimeadaptationtonetworkattacksanexcitingfieldthathasonlyrecentlybeenmadefeasiblethroughadvancesinarchitectureanddevicecapabilities.Discoveringnetworkattackswithoutbeingtoldwhattolookforisadifficultproblem,withadealofpotentialresearchdirections[29,33].Wewillleverageourexperienceinhigh-performancestringmatchingforintrusiondetectionsystems[5,8]anddataminingtocreatesystemsthatarecapableofautomaticallygeneratingnewpatternsandextendingtheirthreatcoveragewithouthumanintervention.

Weproposetocorrelationminethestringmatchingdata,alongwithvariousotherdatafromthestream,suchassourceanddestinationIP,portnumbers,protocoltype,andpotentiallyrouteinformationprovidedbytraceback.Aftersomeperiodoftraffic,thevariousdataisanalyzedviatheApriorialgorithmtoproduceaseriesofsetsthatoccurmorethanaspecifiednumberoftimeswithinthetrafficperiod.Theinterestingitemsarethosethatoccurnearandwithintheknownattacks–anattemptedattackcommonlyisnotasinglepacketonasinglecomputer,butaseriesofscanningattacksoveranentiresubnet[24].Bysearchingfortheseunknownpacketstemporaly-locatednearknownattacks,wecanautomaticallydeterminestreampatternsthatmayneedtobeconsideredasanewattack.Thegoalistodetermineanewattackandintegrateitintothefilteringsystembeforethescanfindacomputerthattheattackissusceptibleto.Theseproblemshavenotbeenfeasibletoaddressbeforetheadventoflarge,fastFPGAscapableofhandlingthecomputationallyexpensiveoperationsnecessaryforonlineoperation.

Inordertocomposeanautomatedresponsesystemasdescribed,high-levelsystemarchitecturedetailsaswellaslow-levelarchitecturaldetailswillbeaddressed.Whiletheworkisaddressedtowardnetworksecurityissues,thecombinationofstringmatchingandcorrelationminingisstronglyconnectedtobioinformaticsproblemsandHomelandSecurityinterestsaswell.

Thereareseveralproblemsthatmustbeaddressedbeforeweeffectivelyaddresstheproblem.First,goodsampledatamustbearranged.Whilethearchitecturaldesignoftherelevantkernelsisofmoreinterestthantheoverallsystemdesign,wewillrequiresampledatatoevaluatethefeasibilityofourapproach.Thereareseveralsourcesforthisdata,namely,theLincolnLabsdatasetsandtheDeternetwork[10].Becauseourapproachassumesatapatthechockpointofasubnet,weneeddatathatrepresentsthetrafficoverawiderangeofaddresses.TheDeternetworkmaybearelevantsourcefortestdatatoexplorethebehaviorofthedetectionsystem.Thesecondproblemtoaddressistheidentificationofstreamsthatdonothaveamatchingpattern.Inna¨ıvepatternmatching-basedintrusiondetection,thereisnointerestinastreamthatdoesnotmatchoneofthepatternsinthedatabase.Thosestreamsthatdohaveamatchareeasilycataloged,andtheyareeasilyusedasinputstoadataminingkernel.However,streamsthatdonotmatchaparticularpatternhavenoeasywayofbeingcollectedtogether.

WeillustratethisprobleminFigure4.Streams1and3arebotheasilyrecognized.Whiletheidentifyingsequence“Pattern1”isnotlocatedatthesameoffsetwithineachstream,becausethe

15

Figure4:OffsetstreamsthatmatchapatterninthedatabaseareeasilyassignedID’s.Streamsnotmatchingpatternsarechallengingtocharacterize.

patternmatchingsystemsearchesallpossibleoffsets,thesystemeasilydeterminesthatStream1andStream3are,infact,thesameattack.

Stream2andStream4areadifferentsituation.Becausetheydonotmatchanycurrentlyknownattack,theycannotbegivenanidentifyingnumber.However,theybothcontainasequenceofcharacters“PatternA”that,whileunknowntotheIDSatthepresenttime,cansuccessfullycompromiseaserverifdelivered.Becausetheyattackerisintelligent,heknowsthatitisbesttosendtherelevantattackingstringindifferentflavorssoastoavoiddetection.Differentoffsets,anddifferentpaddingcharacterscanmaketwoequivalentattacksappearverydifferent.

Thisisadifficultproblemtoaddress.Oneapproachistofindcommonsubstringswithintheunknownstreams.Unfortunately,astherearemanymoreinnocentstreamsthanattackingstreams,comparingthemallwithoutsomealgorithmicmagicwouldrequireimmenseamountsofcomputa-tionalresources.Assumingthattheattacksareplacedwithinthestreamidenticallyandthatwecancomparetheentirestreamatonce,thisrequiresO(m2)time.Assuming,morerealistically,thatweessentiallyhavetousetheSmith-Watermanalgorithm[32]atO(n2)perstream,wemovetoadif-ficultO(m2n2)time.Thisisinconvenientunlesswecanfindanotherapproachtotheidentificationproblem.

Oneapproachistosimplyreducen.Anomalydetectionmethodsusestatisticalmodelsofserverusagethatcouldbeuseful.Failingrequestsonmultipleportsisacommoncharacteristicofascript-basedattack,butconnectionsfromaninnocentuserareunlikelytotraversemorethanahandfulofcommonlyusedports.Thus,anypacketthatishandledwithouterrorbytheapplicationlayercanprobablybeignored.Wecandetermineanerrorconditionbylookingatapplicationlayerreturns;packetsthatgotoaclosedportorreturnastandarderrorinanactiveportcanbereasonablyassumedtobesuspiciousandworthinvestigatingfurther.Thesepacketstreamsareloggedformining.

Thisresearchtrackcouldbeconsideredapartialfulfillmentoftheholygrailofnetworksecurity.Whileitcannotprovidethelevelofsecurityavailableanall-knowing,completelycontrolledandbehaviorallyunderstoodsystem,ifsuccessfulitwouldbeasubstantivecontributiontothedetectionofnewattacks.Acomplete,testablesystemaccordingtothisdesignwouldbetheworkofmanyPhDstudents,butmodelingbasedonkernelsimulationsisfeasibleinthetimeframeofSection5.

5ResearchPlan

Theresearchweplanonexploringdevelopsfromthemesthatwealreadyhaveexperiencein.Thekeyresearchcontributionscomefromperformanceoptimizedforaparticularapplicationdomain.

16

Webrieflyoutlineanapproximatetimelineforthenextyearbelow:

•Spring2005

Developandimplementenhanceddataminingalgorithms,increasingthroughputbyatleastdoublingdatapathwidthandreducingclockperiod.ImplementinterfacecircuitryforPowerPC405hardprocessorormicroBlazesoftprocessorforVirtex-IIProboard.•Summer2005

Extenddataminingalgorithmsthroughfront-endpreprocessingtoworkwithstringmatchingarchitectures.UnderstandandaddressissuesofhardwareinterfacingwithmemorysystemofML-310board.Completesmall-scalehardwareverificationofdataminingarchitecturewithsoftwarecontrolleddataandcontrolflow.AcquireIPcoresforsubsequencematchingforstreamcomparisons.•Fall2005

Completeintegrationofdataminingandstringmatchingarchitecturesforautonomouspatternidentification.

Wehopetosubmitpaperstothefollowingconferencesoverthenextyear:

•IPDPS’05EfficientDataminingusingtheAprioriAlgorithmonFPGAs(SubmittedOct2004)

•FCCM’05Hash-AcceleratedDataminingusingtheAprioriAlgorithm(DueJan142005)•FPL’05DataminingforTextAnalysis(DueMarch2005)

•ConferenceonCyberSecurityAutomatedSignatureDiscoveryUsingPacketMining(Nov2005)

6OverallThesisGoals

Ourworkdiffersfromotherresearchintheareainthatwepursueanintelligentapproachtoarchitec-turaldesignthatgoesbeyondimplementationtoanunderstandingofwhatconstrainsperformance.Whetherthismeansreducinginterconnectburdenthroughgraph-basedpartitioning,developinganunderstandingofmaximumbuffersizes,orextendingthefunctionalityofsystolicarrayarchitecture,thesecontributionsaremorethanimplementationsofaparticularalgorithm.Theyarecontributionstothewiderarchitecturalfield,astheunderlyingdesigndecisionsandperformanceenhancementsareapplicabletoapplicationsoutsideourparticulardomain.

Thecontributionsofmyproposedthesisareasfollows:

•Ourminimumbuffersizeproofallowsstringmatchinginsystolicarrayswithoutstallingforef-ficientalgorithmssuchasKMP.Thetwocomparatorapproachdoessignificantlylessworkthatana¨ıveapproachandallowson-the-flyreconfiguration.Thislow-area,runtime-reconfigurable,exact-matcharraycanbeutilizedinmanyapplicationstoprovidefast,efficientstringmatchingforalargevolumeofpatterns.

•ThesystemlevelviewoftheIntrusionDetectionProblemallowsforexploitationofpatternredundancy.Throughpreprocessing,treestructuresandencodingsaredesignedsuchthatbothfrequencyandperformanceishigher.Thisapproachisvalidforanysetofpatternsthatareknownatdesigntimeanddonotoftenrequiremodifications.

17

•Ouruseofpartitioning,alsodependentondesign-timepatternknowledge,allowsthousandsofpatternstoexistinthesamesystemwithoutinterferingwitheachother.Thatis,the“pre-decoding”strategycreatesalargenumberofconnectionstoberouted;ifallofthepatternsareforcedtoexistinthesamesectionofthedevice,amajorityofthelinesnearaparticularcomparatorwillnotberequiredbythatcomparator.Thiscausesareductionintimeperformanceasfanoutandinterconnectdelayincreases,aparticularproblemininterconnect-limitedreconfigurabledevices.Bypartitioningtoseparatesectionsofthedevice,frequencyperformancecanbeincreased.Thisstrategyisapplicabletoanyproblemthatcanbeefficientlymin-cutforFPGAimplementation.

•Our“SystolicInjection”methodologyallowssystolicarraystobeusedforlesspredictableproblems.Whileveryregularproblemssuchasmatrixmultiplicationareeasilyimplementedinasystolicarrayarchitecture,problemswithunpredictablegenerationofresultscancauseanotherwiseregulararraytofallintochaos.Ourinjectionallowsefficientstallingwithlittlereductioninperformance.

18

References

[1]Bloom,B.,Space/timeTrade-offsinHashCodingwithAllowableErrors.Communicationsof

theACM13(1970)422–426.[2]R.Agrawal,T.Imielinski,andA.Swami.MiningAssociationRulesbetweenSetsofItemsin

LargeDatabases.InProceedingsoftheACMSIGMODConference,1993.[3]R.AgrawalandR.Srikant.MiningSequentialPatterns.InProceedingsofthe11thInternational

ConferenceonDataEngineering,1995.[4]Altera,Inc.http://www.altera.com.

[5]Z.K.BakerandV.K.Prasanna.AMethodologyfortheSynthesisofEfficientIntrusion

DetectionSystemsonFPGAs.InProceedingsoftheTwelfthAnnualIEEESymposiumonFieldProgrammableCustomComputingMachines2004(FCCM’04),2004.[6]Z.K.BakerandV.K.Prasanna.AutomatedIncrementalDesignofFlexibleIntrusionDe-tectionSystemsonFPGAs.InTheEighthAnnualWorkshoponHighPerformanceEmbeddedComputing(HPEC’04),2004.[7]Z.K.BakerandV.K.Prasanna.AutomaticSynthesisofEfficientIntrusionDetectionSystems

onFPGAs.InProceedingsofthe14thAnnualInternationalConverenceonField-ProgrammableLogicandApplications(FPL’04),2004.[8]Z.K.BakerandV.K.Prasanna.TimeandAreaEfficientPatternMatchingonFPGAs.

InTheTwelfthAnnualACMInternationalSymposiumonField-ProgrammableGateArrays(FPGA’04),2004.[9]Z.K.BakerandV.K.Prasanna.EfficientParallelDataMiningwiththeAprioriAlgorithmon

FPGAs.InSubmittedtotheIEEEInternationalParallelandDistributedProcessingSymposium(IPDPS’05),2005.[10]T.Benzel,B.Braden,C.Neuman,andD.Kim.CyberDefenseTechnologyExperimental

Research(DETER)Network.http://www.isi.edu/deter/,2003.[11]F.Bodon.AFastAprioriImplementation.InProceedingsoftheIEEEICDMWorkshopon

FrequentItemsetMiningImplementations,2003.[12]CeloxicaMATLAB/SimulinkInterface.

asp.

http://www.celoxica.com/methodology/matlab.

[13]Y.ChoandWilliamH.Mangione-Smith.DeepPacketFilterwithDedicatedLogicandRead

OnlyMemories.InProceedingsoftheTwelfthAnnualIEEESymposiumonFieldProgrammableCustomComputingMachines2004(FCCM’04),2004.[14]Y.H.Cho,S.Navab,andW.H.Mangione-Smith.SpecializedHardwareforDeepNetwork

PacketFiltering.InProceedingstheTenthACM/SIGDAInternationalConferenceonField-ProgrammableLogicandApplications(FPL’02),2002.[15]C.R.ClarkandD.E.Schimmel.ScalableParallelPatternMatchingonHighSpeedNetworks.In

ProceedingsoftheTwelfthAnnualIEEESymposiumonFieldProgrammableCustomComputingMachines2004(FCCM’04),2004.[16]M.Estlick,M.Leeser,J.Szymanski,andJ.Theiler.AlgorithmicTransformationsintheIm-plementationofK-meansClusteringonReconfigurableHardware.InProceedingsoftheNinthAnnualIEEESymposiumonFieldProgrammableCustomComputingMachines2001(FCCM’01),2001.

19

[17]M.Gokhake,D.Dubois,A.Dubois,M.Boorman,S.Poole,andV.Hogsett.Granidt:Towards

GigabitRateNetworkIntrusionDetection.InProceedingstheEleventhAnnualACM/SIGDAInternationalConferenceonField-ProgrammableLogicandApplications(FPL’03),2002.[18]B.L.Hutchings,R.Franklin,andD.Carver.AssistingNetworkIntrusionDetectionwith

ReconfigurableHardware.InProceedingsoftheTenthAnnualField-ProgrammableCustomComputingMachines(FCCM’02),2002.[19]P.James-Roxby,G.Brebner,andD.Bemmann.Time-CriticalSoftwareDecelerationinan

FCCM.InProceedingsoftheTwelfthAnnualIEEESymposiumonFieldProgrammableCustomComputingMachines2004(FCCM’04),2004.[20]B.KernighanandS.Lin.AnEfficientHeuristicProcedureforPartitioningGraphs,1970.Bell

SystemTech.[21]D.E.Knuth,J.Morris,andV.R.Pratt.FastPatternMatchinginStrings.InSIAMJournal

onComputing,1977.[22]R.Krishnamurthy,S.Yalamanchili,K.Schwan,andR.West.ShareStreams:AScalableAr-chitectureandHardwareSupportforHigh-SpeedQoSPacketSchedulers.InProceedingsoftheTwelfthAnnualIEEESymposiumonFieldProgrammableCustomComputingMachines2004(FCCM’04),2004.[23]H.T.KungandC.E.Leiserson.Systolicarrays(forVLSI).InSparseMatrixProceedings,1979.[24]S.Lau.TheSpinningCubeofPotentialDoom.http://www.nersc.gov/nusers/security/

TheSpinningCube.php.[25]D.-U.Lee,W.Luk,C.Wang,C.Jones,M.Smith,andJ.Villasenor.AFlexibleHardware

EncoderforLow-DensityParity-CheckCodes.InProceedingsoftheTwelfthAnnualIEEESymposiumonFieldProgrammableCustomComputingMachines2004(FCCM’04),2004.[26]J.Moscola,J.Lockwood,R.P.Loui,andM.Pachos.ImplementationofaContent-Scanning

ModuleforanInternetFirewall.InProceedingsoftheEleventhAnnualIEEESymposiumonField-ProgrammableCustomComputingMachines(FCCM’03),2003.[27]D.Nguyen,J.Zambreno,andG.Memik.FlowMonitoringinHigh-SpeedNetworkswith

2DHashTables.InProceedingsofthe14thAnnualInternationalConverenceonField-ProgrammableLogicandApplications(FPL’04),2004.[28]Dept.oftheAirForce.CFWP:RapidProcessingofIntelligenceData,

http://http://www2.eps.gov/spg/USAF/AFMC/AFRLRRS/Reference-Number-BAA-0%4-09-IFKA/SynopsisP.html.

2004.

[29]M.QinandK.Hwang.FrequentEpisodeRulesforInternetAnomalyDetection.Proceedings

oftheIEEEInternationalSymposiumonNetworkComputingandApplications,2004.[30]R.Sidhu,A.Mei,andV.K.Prasanna.StringMatchingonMulticontextFPGAsusingSelf-Reconfiguration.InProceedingsoftheSeventhAnnualACM/SIGDAInternationalSymposiumonFieldProgrammableGateArrays(FPGA’99),1999.[31]R.SidhuandV.K.Prasanna.FastRegularExpressionMatchingusingFPGAs.InProceedings

oftheNinthAnnualIEEESymposiumonField-ProgrammableCustomComputingMachines(FCCM’01),2001.[32]T.F.SmithandM.S.Waterman.IdentificationofCommonMolecularSubsequences.Journal

ofMolecularBiology,147:195–197,1981.

20

[33]B.So,M.W.Hall,andP.C.Diniz.UsingDataMiningtoDiscoverSignaturesinNetwork-based

IntrusionDetection.InProceedingsoftheFirstInternationalConferenceonMachineLearningandCybernetics,2002.[34]Sourcefire.Snort:TheOpenSourceNetworkIntrusionDetectionSystem.http://www.snort.

org,2003.[35]I.SourdisandD.Pnevmatikatos.Fast,Large-ScaleStringMatchfora10GbpsFPGA-Based

NetworkIntrusionDetectionSystem.InProceedingstheEleventhAnnualACM/SIGDAInter-nationalConferenceonField-ProgrammableLogicandApplications(FPL’03),2003.[36]I.SourdisandD.Pnevmatikatos.AMethodologyfortheSynthesisofEfficientIntrusionDe-tectionSystemsonFPGAs.InProceedingsoftheTwelfthAnnualIEEESymposiumonFieldProgrammableCustomComputingMachines2004(FCCM’04),2004.[37]SRCComputers,Inc.http://www.srccomputers.com.

[38]T.H.Cormen,C.E.Leiserson,andR.L.Rivest.IntroductiontoAlgorithms.MITPress,McGraw-Hill,1990.[39]C.Wolinski,M.Gokhale,andK.McCabe.AReconfigurableComputingFabric.InProceedings

ofthe,2004.[40]Xilinx,Inc.http://www.xilinx.com.

[41]Xilinx,Inc.http://www.xilinx.com/ise/advanced/forge_get.htm.

[42]J.Zambreno,D.Nguyen,andA.Choudhary.ExploringArea/DelayTradeoffsinanAES

FPGAImplementation.InProceedingsofthe14thAnnualInternationalConverenceonField-ProgrammableLogicandApplications(FPL’04),2004.

21

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- igat.cn 版权所有

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务