您好,欢迎来到筏尚旅游网。
搜索
您的当前位置:首页Inferring Markov Chains Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-cla

Inferring Markov Chains Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-cla

来源:筏尚旅游网
SantaFeInstituteWorkingPaper07-03-XXX

arxiv.org/xxxxx/0703XXX

InferringMarkovChains:BayesianEstimation,

ModelComparison,EntropyRate,andOut-of-classModeling

ChristopherC.Strelioff,1,2,∗JamesP.Crutchfield,1,†andAlfredW.H¨ubler2,‡

CenterforComputationalScience&EngineeringandPhysicsDepartment,

UniversityofCaliforniaatDavis,OneShieldsAvenue,Davis,CA95616

2

CenterforComplexSystemsResearchandPhysicsDepartment,

UniversityofIllinoisatUrbana-Champaign,1110WestGreenStreet,Urbana,Illinois61801

1

arXiv:math/0703715v1 [math.ST] 24 Mar 2007Markovchainsareanaturalandwellunderstoodtoolfordescribingone-dimensionalpatternsintimeorspace.Weshowhowtoinferk-thorderMarkovchains,forarbitraryk,fromfinitedatabyapplyingBayesianmethodstobothparameterestimationandmodel-orderselection.Extendingexistingresultsformultinomialmodelsofdiscretedata,weconnectinferencetostatisticalmechanicsthroughinformation-theoretic(typetheory)techniques.WeestablishadirectrelationshipbetweenBayesianevidenceandthepartitionfunctionwhichallowsforstraightforwardcalculationoftheexpectationandvarianceoftheconditionalrelativeentropyandthesourceentropyrate.Finally,weintroduceanovelmethodthatusesfinitedata-sizescalingwithmodel-ordercomparisontoinferthestructureofout-of-classprocesses.

PACSnumbers:02.50.Tt,02.50.Ga,05.10.Gg

I.INTRODUCTION

Statisticalinferenceofmodelsfromsmalldatasamplesisavitaltoolintheunderstandingofnaturalsystems.Inmanyproblemsofinterestdataconsistsofasequenceoflettersfromafinitealphabet.Examplesincludeanalysisofsequenceinformationinbiopolymers[1,2],investiga-tionofone-dimensionalspinsystems[3],modelsofnat-urallanguages[4],andcoarse-grainedmodelsofchaoticdynamics[5,6].Thisdiversityofpotentialapplicationhasresultedinthedevelopmentofavarietyofrepresen-tationsfordescribingdiscrete-valueddataseries.

Weconsiderthek-thorderMarkovchainmodelclasswhichusesthepreviousklettersinasequencetopre-dictthenextletter.InferenceofMarkovchainsfromdatahasalonghistoryinmathematicalstatistics.Earlyworkfocusedonmaximumlikelihoodmethodsforesti-matingtheparametersoftheMarkovchain[7,8,9].Thisworkoftenassumedagivenfixedmodelorder.Thatis,nomodelcomparisonacrossordersisdone.Thisworkalsotypicallyreliedontheassumedasymptoticnormal-ityofthelikelihoodwhenestimatingregionsofconfi-denceandwhenimplementingmodelcomparison.Asaresult,therealmofapplicationhasbeenlimitedtodatasourceswheretheseconditionsaremet.Oneconsequenceoftheseassumptionshasbeenthatdatasourceswhichexhibitforbiddenwords,symbolsequenceswhicharenotallowed,cannotbeanalyzedwiththesemethods.Thistypeofdataviolatestheassumednormalityofthelike-lihoodfunction.

Morerecently,modelcomparisoninthemaximumlike-lihoodapproachhasbeenextendedusingvariousinfor-

2

understandingofinferencenotavailableintheclassicalstatisticsapproachtotheseproblems.Aswedemon-strate,eachofthesestepsisvitalandimplementationofonestepwithouttheothersdoesnotprovideacompleteanalysisofthedata-modelconnection.

Moreover,thecombinationofinferenceofmodelpa-rameters,comparisonofperformanceacrossmodelor-ders,andestimationofentropyratesprovidesapowerfultoolforunderstandingMarkovchainmodelsthemselves.Remarkably,thisistrueevenwhenthegeneratingdatasourceisoutsideoftheMarkovchainmodelclass.Modelcomparisonprovidesasenseofthestructureofthedatasource,whereasestimatesoftheentropyrateprovideadescriptionoftheinherentrandomness.Bayesianinfer-ence,informationtheory,andtoolsfromstatisticalme-chanicspresentedheretouchonalloftheseissueswithinaunifiedframework.

Wedevelopthisasfollows,assumingapassingfamil-iaritywithBayesianmethodsandstatisticalmechanics.First,wediscussestimationofMarkovchainparame-tersusingBayesianmethods,emphasizingtheuseofthecompletemarginalposteriordensityforeachparameter,ratherthanpointestimateswitherrorbars.Second,weconsiderselectionoftheappropriatememorykgivenaparticulardataset,demonstratingthatamixtureofor-dersmayoftenbemoreappropriatethanselectingasin-gleorder.ThisiscertainlyamoregenuinelyBayesianapproach.InthesefirsttwopartsweexploitdifferentformsofBayes’theoremtoconnectdataandmodelclass.Third,weconsiderthemathematicalstructureoftheevidence(ormarginallikelihood)anddrawconnectionstostatisticalmechanics.Inthisdiscussionwepresentamethodforestimatingentropyratesbytakingderivativesofapartitionfunctionformedfromelementsofeachstepoftheinferenceprocedure.Last,weapplythesetoolstothreeexampleinformationsourcesofincreasingcomplex-ity.ThefirstexamplebelongstotheMarkovchainmodelclass,buttheothertwoareexamplesofhiddenMarkovmodels(HMMs)thatfalloutsideofthatclass.Weshowthatthemethodsdevelopedhereprovideapowerfultoolforunderstandingdatafromthesesources,evenwhentheydonotbelongtothemodelclassbeingassumed.

fortheposteriorweobtainBayes’theorem:

P(θ|D,M)=

P(D|θ,M)P(θ|M)

II.INFERRINGMODELPARAMETERS

InthefirstlevelofBayesianinferencewedevelopasystematicrelationbetweenthedataD,thechosenmodelclassM,andthevectorofmodelparametersθ.TheobjectofinterestintheinferenceofmodelparametersistheposteriorprobabilitydensityP(θ|D,M).Thisistheprobabilityofthemodelparametersgiventheobserveddataandchosenmodel.TofindtheposteriorwefirstconsiderthejointdistributionP(θ,D|M)overthedataandmodelparametersgiventhatonehaschosentomodelthesourcewitharepresentationinacertainclassM.Thiscanbefactoredintwoways:P(θ|D,M)P(D|M)orP(D|θ,M)P(θ|M).Settingtheseequalandsolving

B.

Likelihood

GivenasampleofdataD=s0s1...sN−1,thelike-lihoodcanbewrittendownusingtheMarkovproperty

ofEq.(3)andthestationarityofEq.(4).Thisresultsintheform

P(D|θk

,Mk

)=

󰀃s󰀃p(s|←−sk)n(←−sk

s),(6)s∈A←−k∈Ak

wheren(←−sks)isthenumberoftimestheword←−sksoc-cursinthesampleD.Forfutureusewealsonotationforobservedn(←−thenumbersk)=󰀇oftimesaword←−introduce

s∈An(←

−skhasbeensks).WenotethatEq.(6)isconditionedonthestartsequence←−sk=s0s1...sk−1.

C.

Prior

ThepriorP(θ|M)isusedtospecifyassumptionsabout

themodeltobeinferredbeforethedataisconsidered.Hereweuseconjugatepriorsforwhichtheposteriordis-tributionhasthesamefunctionalformastheprior.Ourchoiceallowsustoderiveexactexpressionsformanyquantitiesofinterestininference.Thisprovidesapow-erfultoolforunderstandingwhatinformationisgainedduringinferenceand,especially,modelcomparison.

Theexactformofthepriorisdeterminedbyouras-signmentofhyperparametersα(←−sks)forthepriorwhich

balancethestrengthofthemodelingassumptionsen-codedintheprioragainsttheweightofthedata.Forak-thorderMarkovchain,thereisonehyperparameter

foreachword←−sks,giventhealphabetunderconsider-ation.Ausefulwaytothinkabouttheassignmentof

valuestothehyperparametersistorelatethemcountsn˜(←−sks),suchthatα(←−sks)=n˜(←−tofake

sks)+1.Inthisway,theα(←−sks)canbesettoreflectknowledgeofthe

datasourceandthestrengthofthesepriorassumptionscanbeproperly←weightedinrelationtotheactualdata

countsn(−sks).

TheconjugatepriorforMarkovchaininferenceisproductofDirichletdistributions,oneforeachword←−a

sk.

Itrestatesthefinite-memoryassumptionfromthemodeldefinition:

P(θ)=

󰀃󰀄Γ(α(←−sk))

k|Mk←−sk∈Ak

3

α(←−sk)

,(8)Varα(←−sks)(α(←−sk)−α(←−sks))

prior[p(s|←−sk)]=

s∈A

Γ(α(←−sks))

(12)

×

󰀈

󰀈

s∈A

Γ(n(←−sks)+α(←−sks))

E.

Posterior

UsingBayes’theoremEq.(1)theresultsofthethreeprevioussectionscanbecombinedtoobtaintheposteriordistributionovertheparametersofthek-thorderMarkovchain.Onefinds:

󰀃󰀄Γ(n(←−sk)+α(←−P(θ)=

sk))

k|D,Mk←−sk∈Ak

n(←−sk)+α(←−sk)

,(14)

Varn(←−sks)+α(←−spost[p(s|←−sk

k)]=

s)(n(←−sk)+α(←−sk)+1)

.

Thisistheposteriormeanestimate(PME)ofthemodel

parameters.

AdeeperunderstandingofEq.(14)isobtainedthroughasimplefactoring:

Epost[p(s|←−sk

)]=1n(←−sk)

󰀋

+α(←−󰀉−(16)

sk

)

α(←sks)4

󰀈

s∈A

Γ(n(←−sks)+α(←−sks))

󰀈

(19)

s∈A

Γ(n(←−sks)+m(←−sks)+α(←−sks))

×

III.

MODELCOMPARISON

WiththeabilitytoinferaMarkovchainofagivenor-derk,acommonsensequestionistoaskhowdowechoosethecorrectordergivenaparticulardataset?Bayesianmethodshaveasystematicwaytoaddressthisthroughtheuseofmodelcomparison.

Inmanyways,thisprocessisanalogoustoinferringmodelparametersthemselves,whichwejustlaidout.Westartbyenumeratingthesetofmodelorderstobe

comparedM={Mk}kkmax

min,wherekminandkmaxcor-respondtotheminimumandmaximumordertobein-ferred,respectively.Althoughwewillnotconsideranindependent,identicallydistributed(IID)model(k=0)here,wedonotethatthiscouldbeincludedusingthesametechniquesdescribedbelow.

WestartwiththejointprobabilityP(Mk,D|M)ofaparticularmodelMk∈ManddatasampleD,factoringitintwowaysfollowingBayes’theorem.Solvingfortheprobabilityofaparticularmodelclassweobtain

P(M|Mk|D,M)=

P(Dk,M)P(Mk|M)

P(D|M′k,M)

.(22)

Inthesecondcase,a󰀇

M′k∈M

commonpenaltyforthenumberofmodelparametersis

P(MMk|M)=

exp(−|k|)

󰀇

M′P(D|M′k,M)exp(−|M′k

k|)

.(25)

5

WenotethatthenormalizationsuminEq.(23)cancelsbecauseitappearsinboththenumeratoranddenomina-tor.

BayesianmodelcomparisonhasanaturalOccam’sra-zorinthemodelcomparisonprocess[18].Thismeansthereisanaturalpreferenceforsmallermodelsevenwhenauniformpriorovermodelordersisapplied.Inthislight,apenaltyforthenumberofmodelparameterscanbeseenasaverycautiousformofmodelcomparison.Bothofthesepriors,Eq.(22)andEq.(25),willbeconsideredintheexamplestofollow.

Anoteisinorderoncomputationalimplementation.Ingeneral,theresultingprobabilitiescanbeextremelysmall,easilyresultinginnumericalunderflowiftheequa-tionsarenotimplementedwithcare.Asmentionedin[16],computationwithextendedlogarithmscanbeusedtoalleviatetheseconcerns.

IV.

INFORMATIONTHEORY,STATISTICALMECHANICS,ANDENTROPYRATES

Animportantpropertyofaninformationsourceisits

entropyratehµ,whichindicatesthedegreeofintrinsicrandomnessandcontrolstheachievablecompression.Afirstattemptatestimatingasource’sentropyratemightconsistofpluggingaMarkovchain’sestimatedmodelpa-rametersintotheknownexpression[17].However,thisdoesnotaccuratelyreflecttheposteriordistributionde-rivedabove.Thisobservationleavestworealisticalter-natives.Thefirstoptionistosamplemodelparametersfromtheposteriordistribution.Thesesamplescanthenbeusedtocalculateasetofentropyrateestimatesthatreflecttheunderlyingposteriordistribution.Asecondoption,whichwetakehere,istoadaptmethodsfromtypetheoryandstatisticalmechanicspreviouslydevel-opedforIIDmodels[19]toMarkovchains.TothebestofourknowledgethisisthefirsttimetheseideashavebeenextendedtoinferringMarkovchains;althoughcf.[20].

Insimpleterms,typetheoryshowsthattheprobabil-ityofanobservedsequencecanbewrittenintermsoftheKullback-Leibler(KL)distanceandtheentropyrate.WhenappliedtotheMarkovchaininferenceproblemtheresultingformsuggestsaconnectiontostatisticalme-chanics.Forexample,wewillshowthataveragesoftheKL-distanceandentropyratewithrespecttotheposte-riorarefoundbytakingsimplederivativesofapartitionfunction.

Theconnectionbetweeninferenceandinformationthe-orystartsbyconsideringtheproductofthepriorEq.(7)andlikelihoodEq.(6):

P(θk|Mk)P(D|θk,Mk)=P(D,θk|Mk).

(26)

Thisformsajointdistributionovertheobserveddata

DandmodelparametersθkgiventhemodelorderMk.DenotingthenormalizationconstantfromthepriorasZ

tosavespace,thisjointdistributionis

P(D,θk|Mk)=Z󰀃p(s|←−sk)n(←−sks)+α(←−sk

s)−1.(27)

←−sk,s

Thisformcanbewritten,withoutapproximation,in

termsofconditionalrelativeentropiesD[·󰀔·]andentropyratehµ[·]:

P(D,θk|Mk)=Z2−βk(D[Q󰀏P]+hµ[Q])

(28)×2+|A|

k+1

(D[U󰀏P]+hµ[U])

,

whereβk=󰀇←−sk,s

󰀑n(←−sks)+α(←−sk

tionoftrueparametersisP={p(←s−)s󰀁andthek

),p(s|←−distribu-sk)}.ThedistributionsQandUaregivenby

Q=󰀄q(←−sk

)=

n(←−sk)+α(←−sk)U=󰀄n(←−sk)+α(←−sk)

󰀆

u(←−sk)=1|A|

󰀆

,

(30)

whereQisthedistributiondefinedbytheposterior

meanandUisauniformdistribution.Theinformation-theoreticquantitiesusedabovearegivenby

D[Q󰀔P]=

󰀂

q(←−sk)q(s|←−sk

)logq(s|←−sk)2

s,←−sk

log2

∂2

log2

,r(s|←−sk

)=

α(←−sks)αk

6

log2

󰀂

q(←−sk)ψ(0)βkq(←

−sk)←−sk

1

󰀑󰀁klog2

q(←−s)2ψ(1)βkq(←

−sk)sk

󰀑󰀁+

1

󰀂

←−2β|A|k(|A|−1)+O(1/β2

).

(41)

k

kFromthisweseethatthefirsttwotermsentropyratehµ[Q]=H[q(←−sk)q(s|←−makesk)]−H[q(←−upthe

sk)]and

thelasttermisassociatedwiththeconditionalrelativeentropybetweentheposteriormeandistributionQandtruedistributionP.

Insummary,wehavefoundtheaverageofconditionalrelativeentropyandentropyratewithrespecttothepos-teriordensity.Thiswasaccomplishedbymakingconnec-tionstostatisticalmechanicsthroughtypetheory.Unlikesamplingfromtheposteriortoestimatetheentropyrate,thismethodresultsinananalyticformwhichapproaches

hµ[P]astheinverseofthedatasize.Thismethodforapproximatinghµalsoprovidesacomputationalbenefit.NoeigenstateshavetobefoundfromtheMarkovtran-sitionmatrix,allowingforthestorageofvaluesinsparsedatastructures.Thisprovidesadistinctcomputationaladvantagewhenlargeordersoralphabetsareconsidered.Finally,itmightseemawkwardtousetheexpecta-tionofEq.(33)forestimationoftheentropyrate.Thismethodwaschosenbecauseitistheformthatnaturallyappearsinwritingdownthelikelihood-priorcombinationinEq.(28).Asaresultofusingthismethod,mostoftheresultsobtainedabovearewithoutapproximation.Wewerealsoabletoshowthisexpectationconvergestothedesiredvalueinawellbehavedmanor.

V.

EXAMPLES

Toexplorehowtheaboveproducesarobustinference

procedure,let’snowconsiderthestatisticalinferenceofaseriesofincreasinglycomplexdatasources.Thefirst,calledthegoldenmeanprocess,isafirst-orderMarkovchain.TheseconddatasourceiscalledtheevenprocessandcannotberepresentedbyaMarkovchainwithfi-niteorder.However,thissourceisadeterministicHMM,meaningthatthecurrentstateandnextoutputsymboluniquelydeterminethenextstate.Finally,weconsiderthesimplenondeterministicsource,sonamedsinceitssmallestrepresentationisasanondeterministicHMM.(NondeterminismherereferstotheHMMstructure:thecurrentstateandnextoutputsymboldonotuniquelydeterminethenextstate.Thissourceisrepresentedbyaninfinite-statedeterministicHMM[22,23].)

Thegoldenmean,even,andsimplenondeterministicprocessescanallbewrittendownasmodelswithtwoin-ternalstates—callthemAandB.However,thecomplex-ityofthedatageneratedfromeachsourceisofmarkedlydifferentcharacter.Ourgoalinthissectionistocon-siderthethreemainstepsininferencetoanalyzethem.First,weconsiderinferenceofafirst-orderMarkovchaintodemonstratetheestimationofmodelparameterswithuncertainty.Second,weconsidermodelcomparisonforarangeofordersk.Thisallowsustodiscoverstructureinthedatasourceeventhoughthetruemodelclasscannotbecapturedinallcases.Finally,weconsiderestimationofentropyratesfromthesedatasources,investigatinghowrandomnessisexpressedinthem.

Whileinvestigatingtheseprocessesweconsideraver-agedatacounts,ratherthansamplecountsfromspecificrealizations,aswewanttofocusspecificallyontheav-erageperformanceofBayesianinference.Todothiswetakeadvantageoftheknownformofthesources.EachisdescribedbyatransitionmatrixT,whichgivestran-sitionsbetweenstatesAandB:

T=

󰀎

p(A|A)p(B|A)

p(A|B)p(B|B)󰀐.(42)AlthoughtwoofourdatasourcesarenotfiniteMarkovchains,thetransitionmatrixbetweeninternalstatesis

7

Markov.Thismeansthematrixisstochastic(allrowssumtoone)andweareguaranteedaneigenstate󰀫πwitheigenvalueone:󰀫πT=󰀫π.Thiseigenstatede-scribestheasymptoticdistributionoverinternalstates:󰀫π=[p(A),p(B)].

Thetransitionmatrixcanbedividedintolabeledma-tricesT(s)whichcontainthoseelementsofTthatoutputsymbols.Forourbinarydatasourcesonehas

T=T(0)+T(1).

(43)

Usingthesematrices,theaverageprobabilityofwordscanbeestimatedforeachprocessofinterest.Forexam-ple,theprobabilityofword01canbefoundusing

p(01)=󰀫πT(0)T(1)󰀫η,

(44)

where󰀫ηisacolumnvectorwithall1’s.Inthisway,foranydatasizeN,weestimatetheaveragecountforawordas

n(←−sks)=(N−k)p(←−sks).

(45)

Averagecounts,obtainedthisway,willbethebasisfor

alloftheexamplestofollow.

Intheestimationofthetrueentropyratefortheex-amplesweusetheformula

hµ=−

󰀂

p(v)󰀂p(s|v)log2p(s|v)(46)v∈{A,B}

s∈A

forthethegoldenmeanandevenprocesses,where

p(s|v)=Tv(s·)

istheprobabilityofalettersgiventhestatevandp(v)istheasymptoticprobabilityofthestatevwhichcanbefoundasnotedabove.Forthesimplenondeterministicsourcethisclosed-formexpressioncan-notbeappliedandtheentropyratemustbefoundusingmoreinvolvedmethods;see[22]forfurtherdetails.

A.

Goldenmeanprocess:In-classmodeling

Thegoldenmeanprocesscanberepresentedbyasim-ple1st-orderMarkovchainoverabinaryalphabetchar-acterizedbyasingle(shortest)forbiddenwords2=00.Thedefininglabeledtransitionmatricesforthisdatasourcearegivenby

T(0)=

󰀎

01/2

00󰀐,T(1)=󰀎1/2010󰀐.(47)Figure1providesagraphicalrepresentationofthecor-respondinghiddenMarkovchain.Inspectionrevealsa

simplerelationbetweentheinternalstatesAandBandtheoutputsymbols0and1.Anobservationof0indi-catesatransitiontointernalstateBanda1correspondstostateA,makingthisprocessaMarkovchainover0sand1s.

Forthegoldenmeantheeigenstateis󰀫π=[p(A),p(B)]=(2/3,1/3).Withthisvectorandthela-beledtransitionmatricesanydesiredwordcountcanbefoundasdiscussedabove.

1|1/21|1AB0|1/2FIG.1:AdeterministichiddenMarkovchainforthegoldenmeanprocess.Edgesarelabeledwiththeoutputsymbolandthetransitionprobability:symbol|probability.

1.EstimationofM1Parameters

TodemonstratetheeffectiveinferenceoftheMarkovchainparametersforthegoldenmeanprocessweconsideraveragecountsforavarietyofdatasizesN.Foreachsize,themarginalposteriorfortheparametersp(0|1)andp(1|0)isplottedinFig.2.Theresultsdemonstratethattheshapeoftheposterioreffectivelydescribesthedistri-butionofpossiblemodelparametersateachNandcon-vergestothecorrectvaluesofp(0|1)=1/2andp(1|0)=1withincreasingdata.

Pointestimateswithavariancecanbeprovidedforeachoftheparameters,butthesenumbersbythemselvescanbemisleading.However,theestimateobtainedbyusingthemeanandvarianceoftheposteriorareamoreeffectivedescriptionoftheinferenceprocessthanamax-imumlikelihoodestimatewithestimatederrorgivenbyaGaussianapproximationofthelikelihoodalone.AsFig.2demonstrates,infact,aGaussianapproximationofun-certaintyisanineffectivedescriptionofourknowledgewhentheMarkovchainparametersareneartheirupperorlowerlimitsat0and1.Probablythemosteffectivesetofnumberstoprovideconsistsofthemeanoftheposte-riorandaregionofconfidence.Thesewouldmostaccu-ratelydescribeasymmetriesintheuncertaintyofmodelparameters.Althoughwewillnotdothathere,abriefdescriptionoffindingregionsofconfidenceisprovidedinApp.A1.

2.

SelectingtheModelOrderk

Nowconsidertheselectionoftheappropriateorderkfromgoldenmeanrealizations.Asdiscussedabove,thegoldenmeanprocessisafirstorderMarkovchainwithk=1.Asaresult,wewouldexpectmodelcomparisontoselectthisorderfromtheavailablepossibilities.Todemonstratethis,weconsiderordersk=1−4andper-formmodelcomparisonwithauniformprioroverorders(Eq.(22))andwithapenaltyforthenumberofmodelparameters(Eq.(25)).

TheresultsofthemodelcomparisonsaregiveninFig.3.Thetoppanelshowstheprobabilityforeachorderkasafunctionofthesamplesize,usingauniformprior.Forthisprioroverorders,M1isselectedwithanyreasonableamountofdata.However,theredoesseemto

8

FIG.2:AplotoftheinferenceofM1modelparametersforthegoldenmeanprocess.ForeachdatasamplesizeN,themarginalposteriorisplottedfortheparametersofinterest:p(0|1)inthetoppanelandp(1|0)inthelowerpanel.Thetruevaluesoftheparametersarep(0|1)=1/2andp(1|0)=1.

beapossibilitytoover-fitforsmalldatasizeN≤100.Thebottompanelshowsthemodelprobabilitywithapenaltypriorovermodelorderk.Thisremovestheover-fittingatsmalldatasizesandproducesanoffsetwhichmustbeovercomebythedatabeforehigherkisselected.Thisexampleisnotmeanttoargueforthepenaltypriorovermodelorders.Infact,Bayesianmodelcomparisonwithauniformpriordoesaneffectivejobusingarela-tivelysmallsamplesize.

3.

EstimationofEntropyRate

Wecanalsodemonstratetheconvergenceoftheaver-ageforE(Q,P)=D[Q󰀔P]+hµ[Q]giveninEq.(38)tothecorrectentropyrateforthegoldenmeanprocess.Wechoosetoshowthisconvergenceforallordersk=1−4discussedintheprevioussection.Thisexercisedemon-stratesthatallordersgreaterthanorequaltok=1effectivelycapturetheentropyrate.However,thecon-vergencetothecorrectvaluesforhigher-orderktakesmoredatabecauseofalargerinitialvalueofD[Q󰀔P].Thislargervalueissimplyduetothelargernumberofparametersforhigher-orderMarkovchains.

InevaluatingthevalueofD[Q󰀔P]+hµ[Q]fordiffer-

1)0.75M,MD|k0.5M12MM(M34

P0.2501)0.75M,MD|k0.5M1MM2(M34P0.250

200

400

600

800

1000

N:SizeofDataSample

FIG.3:ModelcomparisonforMarkovchainsoforderk=1−4usingaveragecountsfromthegoldenmeanprocess.SamplesizesfromN=100toN=1,000instepsof∆N=5areusedtogeneratetheseplots.Thetoppaneldisplaysthemodelprobabilitiesusingauniformprioroverordersk.Thebottompaneldisplaystheeffectofapenaltyformodelsize.

entsamplelengths,weexpectthatthePMEestimatedQwillconvergetothetruedistributionP.Asaresult,theconditionalrelativeentropyshouldgotozerowithincreasingN.Forthegoldenmeanprocess,theknownvalueoftheentropyrateishµ=2/3bitspersymbol.InspectionofFig.4demonstratestheexpectedconver-genceoftheaveragefromEq.(38)tothetrueentropyrate.

Theresultofourmodelcomparisonfromtheprevioussectioncouldalsobeusedintheestimationoftheentropyrate.AswesawinFig.3,therearerangesofsamplelengthNwheretheprobabilityofordersk=1,2arebothnonzero.Inprinciple,anestimateofhµshouldbemadebyweightingthevaluesobtainedforeachkbythecorrespondingorderprobabilityP(Mk|D,M).AswecanseefromFig.4,theestimatesoftheentropyratefork=1,2arealsoverysimilarinthisrangeofN.Asaresult,thisadditionalstepwouldnothavealargeeffectforentropyrateestimation.

B.Evenprocess:Out-of-classmodeling

Wenowconsideramoredifficultdatasourcecalledtheevenprocess.Thedefininglabeledtransitionmatricesare

9

)lMo1M1mbM2ySM34

/st0.9iB(])P0.8,Q(E[E0.7

0

1000

2000

3000

4000

5000

N:SizeofDataSample

FIG.4:TheconvergenceofEpost[E(Q,P)]tothetrueen-tropyratehµ=2/3bitspersymbol(indicatedbythegrayhorizontalline)forthethegoldenmeanprocess.AsdemonstratedinEq.(41),theconditionalrelativeentropyD[Q󰀈P]→0as1/N.Thisresultsintheconvergenceofhµ[Q]tothetrueentropyrate.

givenby

T

(0)

=

󰀎

1/2000

󰀐

,T

(1)

=

󰀎

01/210

󰀐

.(48)

AscanbeseeninFig.5,thenode-edgestructureisidenticaltothegoldenmeanprocessbuttheoutputsym-bolsontheedgeshavebeenchangedslightly.Asaresultofthisshuffle,thestatesAandBcannolongerbeasso-ciatedwithasimplesequenceof0’sand1’s.WhereasthegoldenmeanhastheirreduciblesetofforbiddenwordsF={00},+1theevenprocesshasacountablyinfinitesetF={012n0:n=0,1,2,...}[22].

0|1/21|1AB1|1/2FIG.5:DeterministichiddenMarkovchainrepresentationoftheevenprocess.Thisprocesscannotberepresentedasafinite-order(nonhidden)Markovchainovertheoutputsymbols0sand1s.ThesetofirreducibleforbiddenwordsF={012n+10:n=0,1,2,...}reflectsthefactthatthepro-cessgeneratesblocksof1’s,boundedby0s,thatareeveninlength,atanylength.

Insimpleterms,theevenprocessproducesblocksof1’swhichareeveninlength.Thisisamuchmorecom-plicatedtypeofmemorythanwesawinthegoldenmeanprocess.FortheMarkovchainmodelclass,whereawordoflengthkisusedtopredictthenextletter,thiswouldrequireaninfinite-orderk.Itwouldbenecessarytokeep

trackofallevenandoddstringsof1’s,irrespectiveofthelength.Asaresult,thepropertiesoftheevenpro-cessmeanthatafiniteMarkovchaincannotrepresentthisdatasource.

Thisexampleisthenademonstrationofwhatcanbelearnedinacaseofout-of-classmodeling.Weareinter-ested,therefore,inhowwellMarkovchainsapproximatetheevenprocess.Weexpectthatmodelcomparisonwillselectlargerkasthesizeofthedatasampleincreases.Doesthemodelselectiontellsusanythingabouttheun-derlyingdatasourcedespitetheinabilitytoexactlycap-tureitsproperties?Aswewillsee,wedoobtainin-triguinghintsofthetruenatureoftheevenprocessfrommodelcomparison.Finally,canweestimatetheentropyrateoftheprocesswithaMarkovchain?Aswewillsee,ahighkisneededtodothiseffectively.

1.EstimationofM1Parameters

InthissectionweconsideranM1approximationoftheevenprocess.Weexpecttheresultingmodeltoaccu-ratelycapturelength-2wordprobabilitiesasNincreases.Inthisexample,weconsiderthetruemodeltobethebestapproximationpossiblebyak=1Markovchain.Fromthelabeledtransitionmatricesgivenabovewecancal-culatetheappropriatevaluesforp(0|1)andp(1|0)usingthemethodsdescribedabove.Startingfromtheasymp-toticdistribution󰀫π=[p(A),p(B)]=[2/3,1/3]weobtainp(0|1)=p(10)/p(1)=1/4andp(1|0)=p(01)/p(0)=1/2.

AswecanseefromFig.6,afirst-orderMarkovchaincanbeinferredwithoutdifficulty.Thevaluesobtainedareexactlyasexpected.However,thesevaluesdonottellusmuchaboutthenatureofthedatasourcebythemselves.Thispointstotheimportantroleofmodelcomparisonandentropyrateestimationinunderstand-ingthisdata.

2.SelectingtheModelOrderk

NowconsidertheselectionofMarkovchainorderk=1−4forarangeofdatasizesN.Recallthattheevenprocesscannotberepresentedbyafinite-orderMarkovchainovertheoutputsymbols0and1.Asaconsequence,weexpecthigherktobeselectedwithincreasingdataN,asmoredatastatisticallyjustifiesmorecomplexmodels.Thisiswhathappens,infact,butthewayinwhichor-dersareselectedasweincreaseNprovidesstructuralinformationwecouldnotobtainfromtheinferenceofaMarkovchainoffixedorder.

IfweconsiderFig.7,aninterestingpatternbecomesapparent.Orderswithevenkarepreferredoverodd.Inthiswaymodelselectionishintingattheunderlyingstructureofthesource.TheMarkovchainmodelclasscannotrepresenttheevenprocessinacompactway,but

10

FIG.6:AplotoftheinferenceofM1modelparametersfortheevenprocess.ForavarietyofsamplesizesN,themarginalposteriorforp(0|1)(toppanel)andp(1|0)(bottompanel)areshown.Thetruevaluesoftheparametersarep(0|1)=1/4andp(1|0)=1/2.

inferenceandmodelcomparisoncombinedprovideusefulinformationaboutthehiddenstructureofthesource.Inthisexamplewealsohaveregionswheretheproba-bilityofmultipleorderskareequallyprobable.Thesam-plesizeatwhichthisoccursdependsontheprioroverorderswhichisemployed.Whenthishappens,proper-tiesestimatedfromtheMarkovchainmodelclassshoulduseaweightedsumofthevariousorders.Aswewillseeintheestimationofentropyrates,thisisnotascritical.Atsamplesizeswheretheorderprobabilitiesaresimilar,theestimatedentropyratesarealsosimilar.

3.EstimationofEntropyRate

Entropyrateestimationfortheevenprocessturnsouttobeamoredifficulttaskthanonemightexpect.InFig.8weseethatMarkovchainsoforders1−6areun-abletoeffectivelycapturethetrueentropyrate.Infact,experienceshowsthatanorderk=10Markovchainorhigherisneededtogetclosetothetruevalueofhµ=2/3bitspersymbol.Notealsothefactorof20longerreal-izationsthatarerequiredcompared,say,tothegolden

1)0.75M,MD|0.5M1kMM23(M4

P0.2501)0.75M,MD|k0.5M1MM2(M34P0.250

200

400

600

800

1000

N:SizeofDataSample

FIG.7:ModelcomparisonforMarkovchainsoforderk=1−4foraveragedatafromtheevenprocess.Thetoppanelshowsthemodelcomparisonwithauniformprioroverthepossibleordersk.Thebottompaneldemonstratesmodelcomparisonwithapenaltyforthenumberofmodelparameters.Inbothcasesthek=4modelischosenoverlowerordersastheamountofdataavailableincreases.

meanexample.

Asdiscussedabove,aweightedsumofEpost[D[Q󰀔P]+hµ[Q]]couldbeemployedinthisexample.Fortheesti-matethisisnotcriticalbecausethedifferentorderspro-videroughlythesamevalueatthesepoints.Infact,thesepointscorrespondtowheretheestimatesofE(Q,P)crossinFig.8.Theyaresamplessizeswhereapparentrandomnesscanbeexplainedbystructureandincreasedorderk.

C.

SimpleNondeterministicSource:Out-of-class

modeling

Thesimplenondeterministicsourceaddsanotherlevelofchallengetoinference.Asitsnamesuggests,itisde-scribedbyanondeterministicHMM.ConsideringFig.9wecanseethata1isproducedoneverytransitionex-ceptfortheB→Aedge.Thismeanstherearemanypathsthroughtheinternalstatesthatproducethesameobservablesequenceof0sand1s.Thedefininglabeledtransitionmatricesforthisprocessaregivenby

T(0)

=

󰀎

001/20

󰀐,T(1)=󰀎1/21/201/2󰀐.(49)11

)lMoM1mb1M2ySM3/sM4tiB0.9M56

(])P,0.8Q(E[E0.7

0

5·103

1·104

1.5·104

2·104

N:SizeofDataSample

FIG.8:TheconvergenceofEpost[D[Q󰀈P]+hµ[Q]]tothetrueentropyratehµ=2/3bitspersymbolforthetheevenprocess.Thetruevalueisindicatedbythehorizontalgrayline.Experienceshowsthatak=10Markovchainisneededtoeffectivelyapproximatethetruevalueofhµ.

(1)Usingthestate-to-statetransitionmatrixT=T(0)+T,wefindtheasymptoticdistributionforthehiddenstatestobe󰀫π=[p(A),p(B)]=[1/2,1/2].Eachofthehiddenstatesisequallylikely;however,a1isalwaysproducedfromstateA,whilethereisanequalchanceofobtaininga0or1fromstateB.

1|1/20|1/21|1/2AB1|1/2FIG.9:AhiddenMarkovchainrepresentationofthesimplenondeterministicprocess.Thisexamplealsocannotberep-resentedasafinite-orderMarkovchainoveroutputs0and1.It,however,ismorecomplicatedthanthetwopreviousexamples:Onlytheobservationofa0providestheobserverwithinformationregardingtheinternalstateoftheunderly-ingprocess;observinga1leavestheinternalstateambiguous.

1.EstimationofM1Parameters

Usingtheasymptoticdistributionderivedabove,the

parametersofaninferredfirst-orderMarkovchainshouldapproachp(0|1)=p(10)/p(1)=1/3andp(1|0)=p(01)/p(0)=1.AswecanseefromFig.10,theinferenceprocesscapturesthesevaluesveryeffectivelydespitetheout-of-classdatasource.

FIG.10:MarginaldensityforM1modelparametersforthesimplenondeterministicprocess:ThecurvesforeachdatasizeNdemonstrateawellbehavedconvergencetothecorrectvalues:p(0|1)=1/3andp(1|0)=1.

2.SelectingtheModelOrderk

HereweconsiderthecomparisonofMarkovchainmod-elsofordersk=1−4whenappliedtodatafromthesim-plenondeterministicsource.Aswiththeevenprocess,weexpectincreasingordertobeselectedastheamountofavailabledataincreases.InFig.11weseethatthisisexactlywhathappens.

Unliketheevenprocess,thereisnopreferenceforevenorders.Instead,weobserveasystematicincreaseinorderwithlargerdatasets.Wedonotethattheamountofdataneedtoselectahigherorderdoesseemtobelargerthanfortheevenprocess.Herethedistributionoverwordsismoreimportantandmoresubtlethanthesupportofthedistribution(thosewordswithpositiveprobability).

3.EstimationofEntropyRate

Estimationoftheentropyrateforthesimplenonde-terministicsourceprovidesaninterestingcontrasttothepreviousexamples.Asdiscussedwhenintroducingtheexamples,thisdatasourceisanondeterministicHMMandtheentropyratecannotbedirectlycalculatedus-

12

FIG.11:ModelcomparisonforMarkovchainsoforderk=1−4fordatafromthesimplenondeterministicprocess.Thetoppanelshowsthemodelcomparisonwithauniformprioroverthepossibleordersk.Thebottompaneldemonstratesmodelcomparisonwithapenaltyforthenumberofmodelparameters.Notethescaleonthehorizontalaxis—ittakesmuchmoredataforthemodelcomparisontopickouthigherordersforthisprocesscomparedtothepreviousexamples.

ingEq.(46)[24].However,avalueofhµ≈0.677867bitspersymbolhasbeenobtainedin[22].

Figure12showstheresultsofentropy-rateestimationusingMarkovchainsoforderk=1−6.Theseresultsdemonstratethattheentropyratecanbeeffectivelyes-timatedwithlow-orderkandrelativelysmalldatasam-ples.Thisisaninterestingresult,aswemightexpectestimationoftheentropyratetobemostdifficultinthisexample.Insteadwefindthattheevenprocesswasamoredifficulttestcase.

VI.

DISCUSSION

Theexamplespresentedaboveprovideseveralinterest-inglessonsininference,modelcomparison,andestimat-ingrandomness.Thecombinationofthesethreeideasappliedtoadatasourceprovidesinformationandintu-itionaboutthestructureoftheunderlyingsystem,evenwhenmodelingout-of-classprocesses.

IntheexamplesofM1estimatesforeachofthesources

)lMo1M1mbM2ySM3/sM4t0.9iBM56

(])P0.8,Q(E[E0.7

0

5·103

1·104

1.5·104

2·104N:SizeofDataSample

FIG.12:TheconvergenceofEpost[D[Q󰀈P]+hµ[Q]]tothetrueentropyratehµ≈0.677867bitspersymbolforthesimplenondeterministicsource.Thetruevalueisindicatedbythegrayhorizontalline.

weseethattheBayesianmethodsprovideapowerfulandconsistentdescriptionofMarkovchainmodelpa-rameters.Themarginaldensityaccuratelydescribestheuncertaintyassociatedwiththeseestimates,reflectingasymmetrieswhichpointestimationwitherrorbarscan-notcapture.Inaddition,methodsdescribedinApp.A1canbeusedtogenerateregionsofconfidenceofanytype.AlthoughtheestimatesobtainedfortheMarkovchainmodelparameterswereconsistentwiththedatasourceforwordsuptolengthk+1,theydidnotcapturethetruenatureofthesystemunderconsideration.Thisdemonstratesthatestimationofmodelparameterswith-outsomekindofmodelcomparisoncanbeverymislead-ing.Onlywiththecomparisonofdifferentordersdidsomeindicationofthetruepropertiesofthedatasourcebecomeclear.Withoutthisstep,misguidedinterpreta-tionsareeasilyobtained.

Forthegoldenmeanprocess,ak=1Markovchain,theresultsofmodelcomparisonwerepredictablyuninter-esting.Thisisagoodindicationthatthecorrectmodelclassisbeingemployed.However,withtheevenprocessamuchmorecomplicatedmodelcomparisonwasfound.Inthiscase,aselectionofevenkoveroddhintedatthedistinguishingpropertiesofthesource.Inasimilarway,theresultsofmodelcomparisonforthesimplenondeter-ministicsourceselectedincreasingorderwithlargerN.Inbothout-of-classmodelingexamples,theincreaseinselectedorderwithoutendisagoodindicationthatthedatasourceisnotintheMarkovchainclass.(Aparalleltechniqueisfoundinhierarchicalǫ-machinereconstruc-tion[22].)Alternatively,thereisanindicationthatveryhigh-orderdependenciesareimportantinthedescriptionoftheprocess.Eitherway,thisinformationisimportantsinceitgivesanindicationtothemodelerthatamorecomplicateddynamicisatworkandallresultsmustbetreatedwithcaution.

Finally,weconsideredtheestimationofentropyratesfortheexampledatasources.Intwoofthecases,the

13

goldenmeanprocessandthesimplenondeterministicsource,shortdatastreamswereadequate.Thisisnotunexpectedforthegoldenmean,butforthesimplenon-deterministicsourcethismightbeconsideredsurprising.Fortheevenprocess,theestimationoftheentropyratewasmarkedlymoredifficult.Forthisdatasource,thecountablyinfinitenumberofforbiddenwordsmakesthesupportoftheworddistributionatagivenlengthimpor-tant.Asaresult,alargeramountofdataandahigher-orderMarkovchainareneededtofindadecentestimateofrandomnessfromthatdatasource.Inthisway,eachofthestepsinBayesianinferenceallowonetoseparatestructurefromrandomness.

VII.CONCLUSION

WeconsideredBayesianinferenceofk-thorderMarkovchainmodels.Thisincludedestimatingmodelparame-tersforagivenk,modelcomparisonbetweenorders,andestimationofrandomnessintheformofentropyrates.Inmostapproachestoinference,thesethreeaspectsaretreatedasseparate,butrelatedendeavors.However,wefindthemtobeintimatelyrelated.Anestimateofmodelparameterswithoutasenseofwhetherthecorrectmodelisbeingusedismisguidedatbest.Modelcomparisonprovidesawindowintothisproblembycomparingvari-ousorderskwithinthemodelclass.Finally,estimatingrandomnessintheformofanentropyrateprovidesmoreinformationaboutthetrade-offbetweenstructureandrandomness.Todothiswedevelopedaconnectiontothestatisticalmechanicalpartitionfunction,fromwhichaveragesandvariancesweredirectlycalculable.Fortheevenprocess,structurewasperceivedasrandomnessandforthesimplenondeterministicsourcerandomnesswaseasilyestimatedandstructurewasmoredifficulttofind.Theseinsights,despitetheout-of-classdata,demonstratethepowerofcombiningthesethreemethodsintooneef-fectivetoolforinvestigatingstructureandrandomnessinfinitestringsofdiscretedata.

Acknowledgments

ThisworkwaspartiallysupportedattheCenterforComputationalScienceandEngineeringattheUniver-sityofCaliforniaatDavisbyIntelCorporation.WorkattheSantaFeInstitutewassupportedunderitsComputa-tion,Dynamics,andInferenceProgramcoregrantsfromtheNationalScienceandMacArthurFoundations.C.S.andA.H.acknowledgesupportbytheNationalScienceFoundationGrantDMS03-25939ITR.

14

APPENDIXA1.

DirichletDistribution

WesupplyabriefoverviewoftheDirichletdistributionforcompleteness.Formoreinformation,areferencesuchas[25]shouldbeconsulted.Insimpleterms,theDirich-letdistributionisthemultinomialgeneralizationoftheBetadistribution.TheprobabilitydensityfunctionforqelementsisgivenbyDir({pi})=

Γ(α)

dencefromamarginaldensity.Themarginalisobtainedfromtheposteriorbyintegratingoutthedependenceonallparametersexceptfortheparameterofinterest.ForaDirichletdistribution,themarginaldensityisknowntobeaBetadistribution[25],

Γ(α)

Beta(pi)=

Var[pj]=

αj(α−αj)

,j=l.

(A2)

α2(1+α)

2.

(A4)

Marginaldistributions

Animportantpartofunderstandinguncertaintyintheinferenceprocessistheabilitytofindregionsofconfi-

15

(1973).

[27]K.MajumderandG.Bhattacharjee,Appl.Stat.22,409

(1973).

[28]G.Cran,K.Martin,andG.Thomas,Appl.Stat.26,111

(1977).

[29]K.Berry,P.W.Mielke,Jr.,andG.Cran,Appl.Stat.39,

309(1990).

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- efsc.cn 版权所有 赣ICP备2024042792号-1

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务