1
RobertTolksdorf1,ElenaPaslaruBontas2
research@robert-tolksdorf.de,http://www.robert-tolksdorf.de
2
paslaru@inf.fu-berlin.de
FreieUniversit¨atBerlinInstitutf¨urInformatik
AGNetzbasierteInformationssystemeTakustr.9,D-14195BerlinGermany
Abstract.Digitalpathologyortelepathologyintendstoextendtheus-ageofelectronicimagesfordiagnostical,supportoreducationalpurposesinanatomicalorclinicalpathology.Availableapproacheshavenotfoundwideacceptanceinroutinepathology,mainlyduetothelimitationsofimageretrieval.Inthispaperweproposeasemanticretrievalsystemforthepathologydomain.Thesystembringsbothtextandimagein-formationtogetherandoffersadvancedcontent-basedretrievalservicesfordiagnosis,differentialdiagnosisandteachingtasks.ThecoreofthesystemisaSemanticWebgatheringbothontologicaldomainknowledge,andrulesdescribingkeytasksandprocessesinpathology.
1Introduction
Digitalpathologyortelepathologyintendstoextendtheusageofelectronicim-agesfordiagnostical,supportoreducationalpurposesinanatomicalorclinicalpathology.Theadvantagesoftheseapproachesaregenerallyacceptedandseveralapplicationsarealreadyavailable.Nevertheless,noneoftheavailableproductshasfoundwideacceptancefordiagnostictasks,mainlyduetothehugeamountofdataresultingfromthedigitalizationprocessandthelimitationsofimage-basedretrieval.Inthispaperweproposeasemanticretrievalsystemforthepathologydomain.Thesystembringsbothtextandimageinformationtogetherandoffersadvancedcontent-basedretrievalservicesfordiagnosis,differentialdi-agnosisandteachingtasks.ThecoreofthesystemisaSemanticWebgatheringbothontologicaldomainknowledge,andrulesdescribingkeytasksandprocessesinpathology.TheusageofSemanticWebstandardsanddomainontologiesfa-cilitatestherealizationofadistributedinfrastructureforknowledgeshareandexchange.Therestofthispaperisorganisedasfollows:Theremainingintro-ductorysectionspresentthesettingoftheproject,telepathology,anditsmainideasandfeatures.Chapter3providesaninsightintothetechnicalaspectsoftheretrievalsystem,byenumeratingthetechnicalrequirementsandtheassociatedsystemarchitecture,followedbyadetaileddescriptionofthesystemcompo-nents.Atthispointwewillpresentourachievementsandthechallengeswearecurrentlyconfrontedwithintherealizationofthemaincomponents.Chapter4
delimitsourapproachfromrelatedresearcheffortsinthisdomain,whileChapter5isdedicatedtofuturework.
1.1Telepathology
Telepathologyisakeydomainintelemedicine.Byusingtelepathologyapproacheslikevirtualmicroscopy,pathologistsanalyzehighqualitydigitalimagesonadis-playscreeninsteadofconventionalglassslideatthecommonlightmicroscope.Inatypicaldigitalpathologysystem,acameraisattachedtoamicroscopeandstillimagesaretaken.Images(withorwithouttextualannotations)arestoredinadatabaseordirectlyinapatientrecord.Casesandimagescanberetrievedfromthedatabaseorpatientrecordasneeded.
Healthcareinformationsystems,whichstoreandintegrateinformationandcoordinateactionsamonghealthcareprofessionals,havebeenrealizedatvari-ousplacesinthelastdecades.Newdevelopmentsintelemedicineallowmedicalpersonneltoremotelydeliverhealthcaretothepatient.AttheCharit´eInstituteofPathologyinBerlin,thefirstweb-basedvirtualmicroscopeallowshistologicalinformationtobeevaluated,transfered,andstoredindigitalformat[17,14].Thistechniqueoffersessentialadvantagescomparedtotheclassicalapproach,bysupportingcommunicationandexchangeamongprofessionalsnotsharingthesameworkplacelocationandimprovingqualityassurancemechanisms[15].However,torealizeacompletecomputer-basedinfrastructureforpathology,oneneedsnotonlyadvancedsupportinthemanagementofdigitalimages.Necessaryisalsoamoreefficientintegrationofthemedicalfindings,whichareproducedbypathologiststodescribetheirobservationsfromanalyzingtheslidesatthelight/digitalmicroscope.
Commoninformationsystemsinpathologyrestricttheirretrievalcapabilitiestoautomaticalpictureanalysisandignorecorrespondingmedicalfindings.Suchanalysisalgorithmshavetheessentialdrawbackthattheyoperateexclusivelyonstructural–orsyntactical–parameterssuchascolor,textureandbasicgeometricalformswhileignoringtherealcontentandtheactualmeaningofthepictures.Medicalfindings,however,containmuchmorethanthatsincetheyaretextualrepresentationsofthepicturalrepresentedcontentoftheslides.Bythattheycapturetheactualsemanticsofwhatthepicturegraphicallyrepresent,forexample“atumor”incontrastto“aredblob”or“acolocatedsetofredpixels”.Therefore,includingmedicalfindingsintheinformationretrievalsystemgoesbeyondpurelysyntacticpictureretrieval.
Intheprojectdescribedinthispaper,wetakethesemanticsaspectsastepfurther:Weunderstandthefindingsreportassemanticmetadatafortheimagepreparedbyanexpertwithhighquality.Weintendtomakethesemanticcontentexplicitandbuildasystemthattakesadvantageoftheexplicitlyrepresentedknowledge.
2ASemanticWebforPathology
Theproject“SemanticWebforPathology”aimstorealizeaSemanticWeb-basedtextandpictureretrievalsystemforthepathologydomain.Forthispurposeweconcentrateoureffortsinthreeinterrelateddirections:1)theconstructionofaknowledgebase,2)thedevelopmentofknowledgereusealgorithmsandofa3)semanticannotationschemaformedicalfindingsanddigitalhistologicalimages.Theknowledgebasecontainsdomainontologies,genericontologiesandrules.Domainontologiesareusedforthemachine-processablerepresentationofspecificpathologyknowledge,whilegenericontologiescapturecommonsenseknowledgethatcanbeusefulinknowledge-intensivetasks.Severalverycomplexlibrariesofontologiesarealreadyavailableforthispurpose.Rulesareintendedtoformalizethekeytasksineverydaypathology.Whileontologiesmodelthebackgroundknowledgeofthepathologists,therulesareusedtodescribethedecisionpro-cessesusingthisknowledge:diagnostics,microscopeanalysis,observationsetc.Theacquisitionofsuchrules,whichplayacrucialrolefortheretrieval,willbeaccomplishedduringanintensivecollaborationwithdomainexperts.
Furtheron,weanalyzethetextualdatawithtextprocessingalgorithmsandannotateitwithconceptsfromtheknowledgebaseinordertoimproveprecisionandrecallinretrievaloperations.Theannotationschemeisharmonizedwiththepathologyknowledgebasebyusingthecorrespondingmedicalontologiesascontrolledvocabularyfortheannotations.Textanalysisisalsousedtoextractimplicitfactualknowledge,whichissubsequentlyintegratedintheknowledgebase.2.1
Mainfeatures
Weforeseeseveralvaluableusesoftheplannedsysteminroutinepathology.First,itmaybeusedasanassistanttoolfordiagnosistasks.Sinceknowledgeismadeexplicit,itsupportsnewquerycapabilitiesfordiagnosistasks:similarityoridentityofcasesbasedonsemanticrulesandmedicalontologies,differentialdiagnosis,semanticallyprecisestatisticalinformationaboutoccurrencesofcer-taindistinguishingcriteriainadiagnosiscase.Theprovidedinformationwillbeveryvaluableindiagnosisworkespeciallyfortheunderdiagnosedcases,sincesuchsituationsrequiredeeperinvestigationsoftheproblemdomainandaverystrictcontrolmechanismofthediagnosisquality([5]).
Second,advancedretrievalcapabilitiesmaybeusedforeducationalpurposesbyteachingpersonnelandstudents.Currently,enormousamountsofknowledgearelostbybeingstoredindatabases,whicharebehavingasrealdatasinks.Theycanandshouldbeusedforteaching,e.g.forcase-basedmedicaleducation.Third,qualityassuranceandcheckingofdiagnosisdecisionscanbeeffectu-atedmoreefficientlybecausethesystemusesaxiomsandrulestoautomaticallycheckconsistencyandvalidity.
Finally,explicitknowledgecanbeexchangedwithexternalpartieslikeotherhospitals.Therepresentationwithinthesystemisalreadythetransferformatforinformation.SemanticWebtechnologiesarebydesignopenfortheintegrationof
knowledgethatisrelativetodifferentontologiesandrules.Thereforeweintendtousemainlysuchtechnologiesfortherealizationoftheretrievalsystem.2.2
Usecasesandtechnicalrequirements
Thetechnicalanalysisanddesignofthepathologyretrievalsystemiscloselyrelatedtotypicalusagescenarios,whicharenotnecessarilyrelatedtoroutinepathology.Mostprobable,thesystemwillbeusedforunderdiagnosedcases,whereasecondorthirdopinionistobeconsultedorthespecialistusuallyrevertstocertifiedcontrollsources,likeInternetorprintedmaterial.Suchinformationsourceshaveanessentialdrawback:theyofferlimitedcapabilitiesforathemat-icallyfocusedsearch.BothmanualsearchwithinprintedmaterialsandInternetsearch,basedoncommonormedicine-relatedsearchengines,istime-consumingandnotspecificenoughtobeintegratedineverydaypathology.Instead,oursys-temwillofferthepossibilitytosearchthearchivofmedicalfindingsforsimilarcasesordifferentialdiagnosis.Itisimprobablethatthesystemwillbeconsultedforroutinecases,coveringapproximately80percentofthetotalamount,whichareontheflyanalyzedbythepathologistswithouttheneedforadditionalin-formationsources.
Theacceptanceofthesystemisstrictlyrelatedtoitsminimalinvasivecharac-ter:itshouldnotimplyanychangeofthecurrentworkflowsandshouldachievegoodprecisionresults.Recallisalsoimportant,butsincethetwoparametersareusuallycontradictory,wefavorprecision,mainlybecauseofthepredomi-nantusageofthesystemforunderdiagnosedcases,withinwhicheverydetailmayplayanimportantroleforthefinalresults.Theminimalinvasivefeaturewillbereflectedinacarefuldesignoftheuserinterfacesandaintuitivequerylanguage.
Anotherimportantsettingisteaching:therefore,thesystemshouldbeabletogeneratedifferentreferencematerialsandtoretrieveinformationabouttypicalpathologycasesandtheirdiagnosis.Thekeyfeatureforthesecondscenarioistheflexibilitytogenerateandpresentdomaininformation.
Thenetworkaspectisimportantforbothsettings.Pathologistsusethesys-temforcaseswheretheyneedtheremotecollaborationofotherspecialists.Theteachingscenarioasumesalsoadistributedinfrastructure,sothattheresourcescanbeaccessedanytime,anywhere.TheusageofSemanticWebtechnologiesononeside,andofstandardslikeXML/OWLandthemedicalHL7/DICOMisaconditionfortherealizationofthisrequirement.
Scalabilityandperformancearecriticalfactorsfortheacceptanceofretrievalsystem.Inourapplication,theamountofimagedataisimpressive.Everypar-ticularcasecontainsupto10medicalfindings.Eachofthesearebasedonupto50digitalhistologicalimages,whichusuallyhaveasizeof4-5GBeach.Ourfirstprototypicalimplementationofthesystemwilldealwithapproximately400findingsandapartofthecorrespondingdigitisedslides.
Thestorageofimageswillstillbesubjecttotheuseofspecializedimagedatabases.Ourapproachofresortingtothedescriptionofimagescontainedinthefindingsandtheirprocessinginthesystemmakestherequirementson
scalabilitywiththenumberandcomplexityofcasesindependentonthesizeoftheimagedata.Thereisnoimageprocessingforeseen,insteadweusetheresultoftheimageanalysisperformedbyhumanexperts,thepathologists.
RemainingscalabilityandperformanceissuesareaffectedbythequalityoftheunderlyingSemanticWebcomponentsandthecomplexityofmodelsusedandinferencesdrawntherein.Currently,therearestrongefforttoproducein-dustrialstrengthSemanticWebcomponents,suchasinferenceenginesthatgobeyondthepoorperformanceofearlyresearchprototypes.Oursystemwillben-efitfromthisperformancegainintheinfrastructure.
Thecomplexityofmodels,rulesandqueriestriggeringinferencesremainsacriticalissue.Whilewehaveasubstantialbasisofmodelswithexistingstandardsititnotclearyet,whatheuristicsshouldguidetheselectionofthegranularityofmodelseventuallyusedandofthedetailsofrulesappliedwhenfinding“sim-ilar”cases.Wewillrestrictourselvestosmallmodelsandrulesetsthatgenerateasuffientpreciseanswersbythesystemwithminimalinferencingeffort.Theprecisemethodologyfordoingsoissubjectofourcurrentstudies.
3EngineeringtheSystem
TechnicallythesystemresortstoSemanticWebtechnologies.TheSemanticWeb([1])aimstoprovideautomatedinformationaccessbasedonmachine-processablesemanticsofdata.ThefinalvisionistodevelopatechnologicalframeworkthatwilltransformtheWebinanhugenetworkofbothhuman-andmachine-understandableknowledgewithvariousspecializedreasoningservices.Thefirststepsinthisdirectionhavebeenmadethroughtherealizationofap-propriaterepresentationlanguagesforWebknowledgesourceslikeRDF(S)andOWLandtheincreasingdisseminationofontologies,thatprovideacommonbasisforannotationandsupportautomaticinferencingforthegenerationofknowledge.
OurapproachmakesuseoftheseSemanticWebtechnologiesinordertorepresentpathologyknowledgeexplicitlyand,consequentlyrefinetheretrievalalgorithmsonasemanticlevel:medicalandgenericontologiesareintegratedintoapathologyknowledgebase,whichservealsoasannotationvocabularyformedicalfindingsandhistologicalimages.WeuseOWLandRDF(S)bothfortherepresentationoftheknowledgebaseandfortheannotationoftheinformationitemsandXML-basedmedicalstandardslikeHL7/CDA([10,9])forthemedicalfindings.
Inmedicineandbiologyexhaustivedomainontologieshavebeendevelopedandareconstantlyincorporatingnewpiecesofknowledge.OntologieslikeUMLS([16]),GALEN([6]),GeneOntology([4])provideagoodbasisforthedevelop-mentofSemanticWebapplicationsformedicinepurposes.Theseontologiesarethereforeusedastheinitialknowledgebaseofthesemanticalretrievalsystemforpathology.Inaddition,toputourgoalsintopracticewestillneedtointegratetheindividualdomainknowledgesourcesandtoadaptthemtotherequirementsoftheSemanticWeb,whichmeansinthefirstplacetoformalizetheminaSe-
manticWebrepresentationlanguage.Ouranalysisintheapplicationdomainhasrevealedthenecessityofapowerfulrepresentationlanguage,whichcancap-turemostofthesemanticalfeaturesofthemedicalknowledge.ForthispurposewewillusemostlyOWLinsteadofRDF(S),mainlybecauseofitsexpressive-nessandinferencingcapabilities.Themainissuesweaddressw.r.t.theavailablemedicalontologieswillbeexplainedindetailinSection3.23.1
Systemarchitecture
Weproposethefollowingsystemarchitecture,whichhasarisenfromtheusecasesandthecorrespondingtechnicalrequirements(Figure1):––––
descriptioncomponentknowledgecomponent
transformationcomponentapplicationcomponents
Inthefollowingwebrieflyexplaintheroleofeachcomponentandtheirinterac-tion,adetaileddescriptionofthefeaturesandrelatedresearchissuesispresentedinSection3.2.
digitalmicroscopegeneratesdigital istological imagesdescription omponentgeneratesompounddescribed bprovidesinformation aboutompoundattribtesdescribed medical findingmedical findingmedical findingprovides informationaboutstatisticalreferencesteacing materialsase-basedpresentationsality cecingapplicationomponentssemanticaldescriptiongeneratesdescribed generatessed indiagnosisretrievaltransformationomponentnowledgeomponentFig.1.Systemarchitecture“SemanticWebforPathology”
Thecoreofthesystemarchitectureistheknowledgecomponent(Figure3),whichconsistsofdomainandgenericontologies,aswellasaruleengine.Theknowledgecomponentinfluenceseveryprocessoftheremainingcompo-nents.Themedicalfindingsandhistologicalimagesareanalyzedsemanticallyandlinguisticallywithinthedescriptioncomponent.Theexplicitelyrepresentedknowledgeisusedtochecktheconsistencyofmedicalfindingsandpicturean-notationsduringtheirgeneration.ThedescriptioncomponentalsoallowstheXMLencodingofthetextualandpicturaldata.BoththeavailablepathologydatabaseattheCharit´ehospitalanddatatobegeneratedaredescribedinXMLinthismanner.ThetransformationcomponenttakestheXML-structureddatasetandintegratesitwithinthesemanticnetworkunderlyingtheknowledgecom-ponent.Duetotheapplication-orientedcaracterofthesystem,specialattentioninthearchitectureispaidtotheapplicationcomponents,whichimplementthefunctionalityofthesystemaspresentedinSection2.Thesearchcomponentisusedbothbypathologistsinordertoretrieveinformationconcerningdiagnosistasksorbyteachingpersonnelandstudents.Weplanalsoacomponentforthegenerationofstatisticalevaluations(e.g.relatedtothemostfrequentdeseasesymptoms,relationshipsbetweenpatientdataanddeseaseevolutionetc.)andforthegenerationofcase-orientedteachingmaterialsandpresentations(seeFig-ure1).Thequalitycheckingserviceisintendedtoevaluatetheconsistencyofmedicalfindings.3.2
Maincomponents
TheDescriptionComponentThedescriptioncomponentisconcernedwiththebasicformalizationofmedicalfindingsanddigitalhistologicalimages.Forthispurposeitdealswithtwoprincipaldatasources:data,whichisalreadyavail-ableattheInstituteofPathologyattheCharit´ehospitalandfuturedata.Thegoalofthisprocessistoofferahomogeneousencodingofmedicalfindings,ononesideandpictureannotationsontheotherside,bothforexistentandfuturematerial.ThedatashouldbefirstencodedinXMLandsubsequentlyanalyzedusingontology-enhancedtextanalysisalgorithmsinordertobeannotatedwithontologyconcepts.ForthegenerationofnewXML-basedinformationwedevel-opedaneditortool,whichcanbeintegratedintheactualversionoftheDigitalVirtualMicroscope([17,14]).Bymeansofthistoolpathologistscananalyzedigi-tisedhistologicalimagesandsimultaneouslyenterorupdatethecorrespondingmedicalfinding,whichissubsequentlystoredinaXMLdatabase.ThesecondsourceofrawdatawasnaturallythemedicalfindingsarchiveattheCharit´e.Themedicalfindingsofthistypehavebeenextractedfromtheirprimarytext-orientedstorageandtransformedinXML.
WedevelopedaHL7/CDAcompatibleXML-schemeforthemedicalfindings,whichreflectthelogicalstructureofthedata.Suchmedicaldataisorganizedmoreorlessconsequenltyinfourmajorparts:
–macroscopydescribingphysicalpropertiesandtheappereanceoftheorig-inalcompound.
–microscopyconcernedwiththedetaileddescriptionoftheslidesanalyzedatthemicroscope.
–diagnosissumarizingtheconclusionsandthediagnosis
–commentsusuallypresentingadditionalfactsplayingaroleinthediagnosisargumentation(patientdata,patienthistoryetc.)oranalternativediagnosisforambiguouscases.Besides,suchamedicalfindingcontainsalsoinformationfromthepatientrecordandreferencestodigitalimages.Theconnectiontothedigitalimagesisfunda-mentalforanefficientretrieval,whichshouldcontainapartfromtherelevanttextualinformationthecorrespondingimageregionthepathologistreferstoinacertainportionoftext.Sincethesizeofsuchimagesis4-5GB,itisnotsuffi-cienttoretrievecompleteimagestoacertainuserquery,buttheconcreteimagesector.ForthispurposeweusethefunctionalityoftheDigitalVirtualMicro-scope,whichallowsdigitalslidestobeannotatedwithso-called“observationpaths”ononeside,andregistryanadditional“dictationpath”.Theobservationpathcontainsimagecoordinates,imageresolutionandtimestampsregisteredwhilethepathologistwasanalyzingaspecificdigitalimage.Thedictationpathsumsupthesamedata,thistimeregisteredwhilethepathologistwastypingthemedicalfinding.Thecompletepath-relatedinformationflowsinthe“diagnosispath”,whichmirrorsthewaythediagnosisdecisionwasaccomplished.
TheproposedXML-SchemereconstructsthestructureoftherealmedicalfindingsandisHL7-compatible.ThoughthecompatibilityrestrictstheformatoftheXMLfindings(theinformationmustbeencodedwithin“section”,“para-graphs”and“codedentry”tags,whichisnotnecessarilythemoststraightfor-wardmannerofformalizingit),itisanimportantissue,especiallyforthedis-tributedsetting,fortheexchangeandreuseofinformation.
TheKnowledgeComponentTheknowledgecomponentincludesthemedicalknowledgebaseandthealgorithmsfortherealizationoftheapplications.AsmentionedinSection3.1itisbuildofalibraryofdomainandgenericontologies,aruleengineandtheannotatedpathologydata(Figure3).Weuseavailablemedicalontologiesasafoundationoftheknowledgebase,startingwithUMLS([16])andGeneOntology([4]).
Themostimportantissuewehavetoaddresswhenbuildingthepathologyknowledgebaseistheintegrationandtheenrichmentoftheavailablemedicinestandards.Medicineontologiesthoughcontainingahugeamountofconceptsorterminihaveseldombeendevelopedformachineprocessing,butratherascontrolledvocabulariesandtaxonomiesforspecifictasksinmedicine([13]).FromastrictSemanticWebpointofviewtheyprovedtobedeficientlyde-signedandincomplete.ApartfromtheabsenceofanatleastSemanticWebcompatiblerepresentationlanguage,UMLSandGeneOntologyadoptanerror-pronemodelingstyle,whichischaracterizedbyfewsemanticrelationsamongconceptsandanambiguouswaytointerpretsuchrelations(e.g.conceptsoftheUMLSMetathesaurusareconnectedthroughrelationslike“related”,“broader”,
xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"xsi:schemaLocation=\"urn::hl7-org/cdasciphox-cda.xsd\"xmlns:swpatho=\"urn::swpatho-org\"> Bronchialepithelregeneraten.RestlicheAlveolarluminaz.T.durchFibroblastenproliferateverlegt.ImInterstitiumeingemischtentz¨undlichesInfiltrat,bestehendausPlasmazellenundLymphozyten.... organisierendenPneumonie(klin.Mittellappen). Fig.2.FragmentofanXML-encodedmedicalfinding knowledgeomonentmedicalloieseneOlomedicalindinimage anatioleenineFig.3.Theknowledgecomponent “narrower”).Atypicalexampleistheusageoftherelation“is-a”forbothin-stantiationandspecialization/generalization,theusageofaunique“part-of”relationwithdifferentmeanings(“functionalpart”,“content”,“component”,“substance”)ortheusageofoneoftheserelationsinsteadoftheother.Math-ematicalpropertiesofthesamesemanticalrelation(e.g.transitivity)arenotfulfilledforeachpairofconceptsconnectedbytherelationandthe“is-a”re-lationbetweentwoconceptsdoesnotalwaysguaranteetheinheritanceofthepropertiesoftheparentconcepttoitschildren(so-called“blocked”relationsinUMLS).Besidesrelations,bothUMLSandGeneOntologycontainahugesetofconceptualentities,organizedinseveraltaxonomies.Theclassificationcriteriaforconceptsareinconsistentandincomplete.Different,unspecifiedgranularitiesareusedwithinahierarchyandpropertiesmaynotbeinheritedalonginheri-tancepaths. Theissueoftherestrictedrepresentationlanguageisaddressedintheseveralprojects,whichusuallydevelopetheirownrepresentationlanguages,adaptedtospecificrequirementsofthemedicaldomainw.r.texpressivenessandinferencingcapabilities.Suchanontology,thoughwithincreasedinferencecapabilitiescom-paredtoUMLSorGeneOntologycannotbeembeddedoffhandinaSemanticWebapplication,sharedorcompletedinaSemanticWebsetting.Evenmore,variousontologieshavebeendevelopedforparticularpurposesandandcannotbeintegratedautomatically.Besidesintegrationandcompletionsuchontolo-giesdoseldomcontainaxiomaticknowledgewhichisessentialfordiagnosticsortherapysettings. ThereforeweneedtoadoptaSemanticWebrepresentationschemefortheavailableontologicalknowledge,completeitwithadditionalaxiomsanddefini-tionsononehandandontheotherhandencodetherapy,diagnosticandtaskknowledgeinasupplementarymoduleasrules.ForthispurposewewillusestandardslikeRuleML.InordertodesignanappropriaterepresentationbasedonSemanticWebwewillfirstidentifyincollaborationwithdomainexpertsthefragmentsofUMLS/GeneOntology,whicharerelevantinthepathology.Secondlywewillanalyzethedeficienciesoftheavailablemedicalstandardsby transformingtheircontentinOWLandautomaticallydiscoveringinconsisten-cies.ThenextstepwillbethemanualadaptationoftheOWLontologyaccordingtotheresultsofthepreviousprocedure.Currentlyweareimplementinganalgo-rithmfortheOWLtransformationofUMLSknoweldgesources.TheunderlyingmodellingprimitivesareillustratedinFigure4. Fig.4.UMLSmodellingprimitivesinOWL Besidesanontology-basedbackgroundtheknowledgebasealsocontainsthecompletesetofmedicalfindingsandimagedescriptions,bothrepresentedinXMLbymeansofthedescriptioncomponent.However,inorderforthisaddi-tionalinformationtobeinvolvedinretrievalandknowledgediscovery,theXMLbasicschemeneedstobeenrichedwithannotationsreferencingontologycon-ceptsandrelations.Forthispurposeweintendtousetextprocessingalgorithmsforaninitialautomaticannotationphaseandtoimplementanannotationtoolforasubsequentmanualannotationphase,whichcompletestheautomaticpro-cess. TheTransformationComponentThetranformationcomponentimplementsfeaturesrequiredforthetext-basedprocessingofthemedicalfindingsandimagedescriptions.Forthispurposewearecurrentlydevelopinganounphrasingmod-ule,whichidentifiesdomain-specificphrasesfrommedicalfindings.Themodulesincorporatesatokenizer,ataggerandaontology-basedphrasegenerator.Thephrasegenerationprocessinterractswiththeknowledgebase,sinceitusesmedi-calontologiestoidentifyrelevant(multi-word)phrasesandinthesametimeputstogetheralexicon,tailoredfortheparticularapplicationsetting:thedomainoflungpathologyandthelanguageusedinthemedicalfindings,whichisGerman.ThelexiconprovidesusindicationsabouttheusagelimitationsofanessentiallyEnglish-orientedthesauruslikeUMLSinourconcretesetting.Asaresultofthephrasingmodule,theXML-encodedmedicalfindingscontainsemanticrelevantphrases,whichcanbereferencedtoconceptsoftheknowledgebase.Thistaskwillberealizedbytheannotationcomponent. ApplicationComponentsTheSemanticWebforPathologywillassistthefollowingapplicationcomponents: –searchcomponentwillbeusedprimarilyfordiagnosistasks.Itwillallownotonlythebasicretrievaloftext/imageinformationitems,butalsosupportdifferentialdiagnosistasks.Thesemanticretrievalisorientedtowardsseveraltypicalcategoriesofqueries: •statisticalqueriese.g.theprobability/frequencyofaparticularcarci-nomainacertainagegroup. •matchingqueriese.g.comparisonofcaseswithcommoncharacteris-tics,textandimageinformationtosimilarcases. •imagequeriese.g.casescontainingimageswithcertaincontent-orimage-specificconstraints. Besides,theretrievalshouldbeadaptedtothecharacteristicsofthepathol-ogydomainandinvolveissueslikethediagnosispath.(Section3.2). –qualitycheckingcomponentwillbeusedinqualityassurenceandman-agementofdiagnosisprocesses.Qualitycriteria,diagnosisstandardsandtheirverificationareexpressedbymeansofrules. –statisticalcomponentwillgeneratestatisticalmaterialrelatedtotherel-ativefrequencyordemographicdistributionofdiseasesandtheircomplica-tions. –teachingcomponentwillgenerateteachingmaterials,usingfeaturesofthepreviouscomponents(statisticalstudies,referencecases) 4RelatedWork Medicineisoneofthebestexamplesofapplicationdomainswhereontologieshavealreadybeendeployedatlargescaleandhavealreadydemonstratedtheirutility.Mostofthesedomainontologies(UMLSinclusively)underliedifferentdesignrequirementsascomputersupportedandevenmorespecificSemantic Webapplications.Theyareactuallyhugecollectionsofmedicalterms,organizedinhierarchiesandcannotbeuseddirectlyinSemanticWebapplications.ThisissuehasbeenaddressedinprojectGALEN([6]),wheretheauthorsdevelopedaspecialdescriptionlogicrepresentation,tailoredfortheparticularitiesofthe(English)medicalvocabulary.However,theusageofaproprietaryrepresentationmakestheontologicalknowledgedifficulttobeextendedbythirdpartiesorexchangedinaSemanticWeb. Theusageofontologiesforbuildingknowledgebasesformedicinehasalreadybeensubjectofseveralresearchprojects([2,12,7,3,8]).ThemostimportantrepresentativesaretheONIONS([7])andMEDSYNDIKATE([12])projects.InONIONStheauthorsaimtodevelopagenericframeworkforontologymerginganduseUMLSasanexampletoapplytheirmethodology.ThereforetheyneedadetailedanalysisoftheontologicalpropertiesofUMLS,usingaLoomformal-ization.MEDSYNDIKATEisalsoconfrontedwiththeontologicalcommitmentbeyondUMLSinordertouseitintextprocessingalgortihmsforknowledgediscovery.UMLSservesinthiscaseasanannotationvocabularyformedicaltexts.BothprojectsoffervaluableexperiencesandfactsconcerningUMLSandmedicalontologiesgenerally,buttheydonotuseSemanticWebtechnologiestofacilitateknowledgeshareandreuse,whichisthecrucialfeatureofontologies.Aninterestingapproachcanalsobefoundin[2],wheretheauthorscompareUMLSwithotherontologies(e.g.WordNet([11],GeneOntology)toestablishitsappropriatenessasterminologyforbiomedicalapplications. 5ConclusionsandFutureWork InthispaperwepresentedourworktowardsaSemanticWebbasedretrievalsystemforpathology.Thesystemisbasedonacomprehensiveknowledgebase,whichformalizespathology-relevantknowledgeexplicitlybyintegratingavail-ablemedicineontologieslikeUMLSandrulesdescribingdiagnosticguidelines.Itisintendedtoprovidebothretrievalandknowledgemanagementfunctionali-ties.Inordertoachievethesegoalswedesignedbynowthesystemarchitecture,adoptedXML-basedschemesfortheuniformrepresentationofmedicalfindingsanddigitalimagesanddevelopedamethodologyfortheconstructionofthepathologyknowledgebase.Currentworkincludesthespecificationandimple-mentationofanalgorithmfortheOWLformalizationofmedicalontologiesandtheirintegrationintheknowledgebase. AcknowledgementTheproject“SemanticWebinthePathology”isfundedbytheDeutscheForschungsgmeinschaft,asacooperationamongtheCharit´eInstituteofPathology,theInstituteforComputerScienceattheFUBerlinandtheDepartmentofLinguisticsattheUniversityofPotsdam,Germany. References 1.T.Berners-Lee,J.Hendler,andO.Lassila.”TheSemanticWeb”.ScientificAmer-ican,284(5):34–43,52001. 2.A.BurgunandO.Bodenreider.MappingtheUMLSSemanticNetworkintoGeneralOntologies.InProc.oftheAMIASymposium,2001. 3.G.CareniniandJ.Moore.”UsingtheUMLSSemanticNetworkasaBasisforConstructingaTerminologicalKnowledgeBase:APreliminaryReport”.InPro-ceedingsof17thSymposiumonComputerApplicationsinMedicalCare(SCAMC’93),1993. 4.TheGeneOntologyConsortium.GeneOntology:toolfortheunificationofbiology.NatureGenetics,25:25–30,2000. 5.F.Demichellis,V.DellaMea,S.Forti,P.DallaPalma,andC.A.Beltrami.”Digitalstorageofglassslideforqualityassuranceinhistopathologyandcytopathology”.TelemedTelecare,8(3):138–42,2002. 6.OntologyGALEN.http://www.opengalen.org,2001. 7.A.Gangemi,D.M.Pisanelli,andG.Steve.”AnOverviewoftheONIONSProject:ApplyingOntologiestotheIntegrationofMedicalTerminologies”.DataKnowledgeEngineering,31(2):183–220,1999. 8.H.Gu,Y.Perl,J.Geller,M.Halper,L.Liu,andJ.Cimino.”RepresentingtheUMLSasanOODB:Modelingissuesandadvantages”,2000. 9.HL7Standard.http://puck.informatik.med.uni-giessen.de/people/messaritakis/-hl7xml/hl7stand.htm,2000. 10.TheHL7/CDAStandard.http://www.hl7.org,2000. 11.G.A.Miller.”WordNet:alexicaldatabaseforEnglish”.Communicationsofthe ACM,38(11):39–41,1995. 12.S.SchulzandU.Hahn.”Medicalknowledgereegineering-convertingmajorpor-tionsoftheUMLSintoaterminologicalknowledgebase”.InternationalJournalofMedicalInformatics,2001. 13.S.Schulz,M.Romacker,andU.Hahn.”KnowledgeengineeringtheUMLS”.Stud HealthTechnolInform,77:701–5,2000. 14.Patentanmeldung:SlideScanner–VorrichtungundVerfahren,2002.Aktenzeichen 102317.6desDPMAvom5.8.2002. 15.J.Slodkowska,K.Kayser,andPHasleton.”TeleconsultationintheChestDisor-ders”.EurJMedRes,7(SupplI):80,2002. 16.UnifiedMedicalLanguageSystem.http://www.nlm.nih.gov/research/umls,2002.17.Patentanmeldung:VirtuellesMikroskop–VorrichtungundVerfahren,2002.Ak-tenzeichen10225174.6desDPMAvom31.05.2002. 因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- igat.cn 版权所有 赣ICP备2024042791号-1
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务