您的当前位置：首页 Inferring Markov Chains Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-cla

Inferring Markov Chains Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-cla

来源：筏尚旅游网

SantaFeInstituteWorkingPaper07-03-XXX

arxiv.org/xxxxx/0703XXX

InferringMarkovChains:BayesianEstimation,

ModelComparison,EntropyRate,andOut-of-classModeling

ChristopherC.Strelioﬀ,1,2,∗JamesP.Crutchﬁeld,1,†andAlfredW.H¨ubler2,‡

CenterforComputationalScience&EngineeringandPhysicsDepartment,

UniversityofCaliforniaatDavis,OneShieldsAvenue,Davis,CA95616

CenterforComplexSystemsResearchandPhysicsDepartment,

UniversityofIllinoisatUrbana-Champaign,1110WestGreenStreet,Urbana,Illinois61801

arXiv:math/0703715v1 [math.ST] 24 Mar 2007Markovchainsareanaturalandwellunderstoodtoolfordescribingone-dimensionalpatternsintimeorspace.Weshowhowtoinferk-thorderMarkovchains,forarbitraryk,fromﬁnitedatabyapplyingBayesianmethodstobothparameterestimationandmodel-orderselection.Extendingexistingresultsformultinomialmodelsofdiscretedata,weconnectinferencetostatisticalmechanicsthroughinformation-theoretic(typetheory)techniques.WeestablishadirectrelationshipbetweenBayesianevidenceandthepartitionfunctionwhichallowsforstraightforwardcalculationoftheexpectationandvarianceoftheconditionalrelativeentropyandthesourceentropyrate.Finally,weintroduceanovelmethodthatusesﬁnitedata-sizescalingwithmodel-ordercomparisontoinferthestructureofout-of-classprocesses.

PACSnumbers:02.50.Tt,02.50.Ga,05.10.Gg

I.INTRODUCTION

Statisticalinferenceofmodelsfromsmalldatasamplesisavitaltoolintheunderstandingofnaturalsystems.Inmanyproblemsofinterestdataconsistsofasequenceoflettersfromaﬁnitealphabet.Examplesincludeanalysisofsequenceinformationinbiopolymers[1,2],investiga-tionofone-dimensionalspinsystems[3],modelsofnat-urallanguages[4],andcoarse-grainedmodelsofchaoticdynamics[5,6].Thisdiversityofpotentialapplicationhasresultedinthedevelopmentofavarietyofrepresen-tationsfordescribingdiscrete-valueddataseries.

Weconsiderthek-thorderMarkovchainmodelclasswhichusesthepreviousklettersinasequencetopre-dictthenextletter.InferenceofMarkovchainsfromdatahasalonghistoryinmathematicalstatistics.Earlyworkfocusedonmaximumlikelihoodmethodsforesti-matingtheparametersoftheMarkovchain[7,8,9].Thisworkoftenassumedagivenﬁxedmodelorder.Thatis,nomodelcomparisonacrossordersisdone.Thisworkalsotypicallyreliedontheassumedasymptoticnormal-ityofthelikelihoodwhenestimatingregionsofconﬁ-denceandwhenimplementingmodelcomparison.Asaresult,therealmofapplicationhasbeenlimitedtodatasourceswheretheseconditionsaremet.Oneconsequenceoftheseassumptionshasbeenthatdatasourceswhichexhibitforbiddenwords,symbolsequenceswhicharenotallowed,cannotbeanalyzedwiththesemethods.Thistypeofdataviolatestheassumednormalityofthelike-lihoodfunction.

Morerecently,modelcomparisoninthemaximumlike-lihoodapproachhasbeenextendedusingvariousinfor-

understandingofinferencenotavailableintheclassicalstatisticsapproachtotheseproblems.Aswedemon-strate,eachofthesestepsisvitalandimplementationofonestepwithouttheothersdoesnotprovideacompleteanalysisofthedata-modelconnection.

Moreover,thecombinationofinferenceofmodelpa-rameters,comparisonofperformanceacrossmodelor-ders,andestimationofentropyratesprovidesapowerfultoolforunderstandingMarkovchainmodelsthemselves.Remarkably,thisistrueevenwhenthegeneratingdatasourceisoutsideoftheMarkovchainmodelclass.Modelcomparisonprovidesasenseofthestructureofthedatasource,whereasestimatesoftheentropyrateprovideadescriptionoftheinherentrandomness.Bayesianinfer-ence,informationtheory,andtoolsfromstatisticalme-chanicspresentedheretouchonalloftheseissueswithinauniﬁedframework.

Wedevelopthisasfollows,assumingapassingfamil-iaritywithBayesianmethodsandstatisticalmechanics.First,wediscussestimationofMarkovchainparame-tersusingBayesianmethods,emphasizingtheuseofthecompletemarginalposteriordensityforeachparameter,ratherthanpointestimateswitherrorbars.Second,weconsiderselectionoftheappropriatememorykgivenaparticulardataset,demonstratingthatamixtureofor-dersmayoftenbemoreappropriatethanselectingasin-gleorder.ThisiscertainlyamoregenuinelyBayesianapproach.IntheseﬁrsttwopartsweexploitdiﬀerentformsofBayes’theoremtoconnectdataandmodelclass.Third,weconsiderthemathematicalstructureoftheevidence(ormarginallikelihood)anddrawconnectionstostatisticalmechanics.Inthisdiscussionwepresentamethodforestimatingentropyratesbytakingderivativesofapartitionfunctionformedfromelementsofeachstepoftheinferenceprocedure.Last,weapplythesetoolstothreeexampleinformationsourcesofincreasingcomplex-ity.TheﬁrstexamplebelongstotheMarkovchainmodelclass,buttheothertwoareexamplesofhiddenMarkovmodels(HMMs)thatfalloutsideofthatclass.Weshowthatthemethodsdevelopedhereprovideapowerfultoolforunderstandingdatafromthesesources,evenwhentheydonotbelongtothemodelclassbeingassumed.

fortheposteriorweobtainBayes’theorem:

P(θ|D,M)=

P(D|θ,M)P(θ|M)

II.INFERRINGMODELPARAMETERS

IntheﬁrstlevelofBayesianinferencewedevelopasystematicrelationbetweenthedataD,thechosenmodelclassM,andthevectorofmodelparametersθ.TheobjectofinterestintheinferenceofmodelparametersistheposteriorprobabilitydensityP(θ|D,M).Thisistheprobabilityofthemodelparametersgiventheobserveddataandchosenmodel.ToﬁndtheposteriorweﬁrstconsiderthejointdistributionP(θ,D|M)overthedataandmodelparametersgiventhatonehaschosentomodelthesourcewitharepresentationinacertainclassM.Thiscanbefactoredintwoways:P(θ|D,M)P(D|M)orP(D|θ,M)P(θ|M).Settingtheseequalandsolving

Likelihood

GivenasampleofdataD=s0s1...sN−1,thelike-lihoodcanbewrittendownusingtheMarkovproperty

ofEq.(3)andthestationarityofEq.(4).Thisresultsintheform

P(D|θk

,Mk

󰀃s󰀃p(s|←−sk)n(←−sk

s),(6)s∈A←−k∈Ak

wheren(←−sks)isthenumberoftimestheword←−sksoc-cursinthesampleD.Forfutureusewealsonotationforobservedn(←−thenumbersk)=󰀇oftimesaword←−introduce

s∈An(←

−skhasbeensks).WenotethatEq.(6)isconditionedonthestartsequence←−sk=s0s1...sk−1.

Prior

ThepriorP(θ|M)isusedtospecifyassumptionsabout

themodeltobeinferredbeforethedataisconsidered.Hereweuseconjugatepriorsforwhichtheposteriordis-tributionhasthesamefunctionalformastheprior.Ourchoiceallowsustoderiveexactexpressionsformanyquantitiesofinterestininference.Thisprovidesapow-erfultoolforunderstandingwhatinformationisgainedduringinferenceand,especially,modelcomparison.

Theexactformofthepriorisdeterminedbyouras-signmentofhyperparametersα(←−sks)forthepriorwhich

balancethestrengthofthemodelingassumptionsen-codedintheprioragainsttheweightofthedata.Forak-thorderMarkovchain,thereisonehyperparameter

foreachword←−sks,giventhealphabetunderconsider-ation.Ausefulwaytothinkabouttheassignmentof

valuestothehyperparametersistorelatethemcountsn˜(←−sks),suchthatα(←−sks)=n˜(←−tofake

sks)+1.Inthisway,theα(←−sks)canbesettoreﬂectknowledgeofthe

datasourceandthestrengthofthesepriorassumptionscanbeproperly←weightedinrelationtotheactualdata

countsn(−sks).

TheconjugatepriorforMarkovchaininferenceisproductofDirichletdistributions,oneforeachword←−a

sk.

Itrestatestheﬁnite-memoryassumptionfromthemodeldeﬁnition:

P(θ)=

󰀃󰀄Γ(α(←−sk))

k|Mk←−sk∈Ak

α(←−sk)

,(8)Varα(←−sks)(α(←−sk)−α(←−sks))

prior[p(s|←−sk)]=

s∈A

Γ(α(←−sks))

(12)

󰀈

s∈A

Γ(n(←−sks)+α(←−sks))

Posterior

UsingBayes’theoremEq.(1)theresultsofthethreeprevioussectionscanbecombinedtoobtaintheposteriordistributionovertheparametersofthek-thorderMarkovchain.Oneﬁnds:

󰀃󰀄Γ(n(←−sk)+α(←−P(θ)=

sk))

k|D,Mk←−sk∈Ak

n(←−sk)+α(←−sk)

,(14)

Varn(←−sks)+α(←−spost[p(s|←−sk

k)]=

s)(n(←−sk)+α(←−sk)+1)

Thisistheposteriormeanestimate(PME)ofthemodel

parameters.

AdeeperunderstandingofEq.(14)isobtainedthroughasimplefactoring:

Epost[p(s|←−sk

)]=1n(←−sk)

󰀋

+α(←−󰀉−(16)

)

α(←sks)4

󰀈

s∈A

Γ(n(←−sks)+α(←−sks))

󰀈

(19)

s∈A

Γ(n(←−sks)+m(←−sks)+α(←−sks))

III.

MODELCOMPARISON

WiththeabilitytoinferaMarkovchainofagivenor-derk,acommonsensequestionistoaskhowdowechoosethecorrectordergivenaparticulardataset?Bayesianmethodshaveasystematicwaytoaddressthisthroughtheuseofmodelcomparison.

Inmanyways,thisprocessisanalogoustoinferringmodelparametersthemselves,whichwejustlaidout.Westartbyenumeratingthesetofmodelorderstobe

comparedM={Mk}kkmax

min,wherekminandkmaxcor-respondtotheminimumandmaximumordertobein-ferred,respectively.Althoughwewillnotconsideranindependent,identicallydistributed(IID)model(k=0)here,wedonotethatthiscouldbeincludedusingthesametechniquesdescribedbelow.

WestartwiththejointprobabilityP(Mk,D|M)ofaparticularmodelMk∈ManddatasampleD,factoringitintwowaysfollowingBayes’theorem.Solvingfortheprobabilityofaparticularmodelclassweobtain

P(M|Mk|D,M)=

P(Dk,M)P(Mk|M)

P(D|M′k,M)

.(22)

Inthesecondcase,a󰀇

M′k∈M

commonpenaltyforthenumberofmodelparametersis

P(MMk|M)=

exp(−|k|)

󰀇

M′P(D|M′k,M)exp(−|M′k

k|)

.(25)

WenotethatthenormalizationsuminEq.(23)cancelsbecauseitappearsinboththenumeratoranddenomina-tor.

BayesianmodelcomparisonhasanaturalOccam’sra-zorinthemodelcomparisonprocess[18].Thismeansthereisanaturalpreferenceforsmallermodelsevenwhenauniformpriorovermodelordersisapplied.Inthislight,apenaltyforthenumberofmodelparameterscanbeseenasaverycautiousformofmodelcomparison.Bothofthesepriors,Eq.(22)andEq.(25),willbeconsideredintheexamplestofollow.

Anoteisinorderoncomputationalimplementation.Ingeneral,theresultingprobabilitiescanbeextremelysmall,easilyresultinginnumericalunderﬂowiftheequa-tionsarenotimplementedwithcare.Asmentionedin[16],computationwithextendedlogarithmscanbeusedtoalleviatetheseconcerns.

IV.

INFORMATIONTHEORY,STATISTICALMECHANICS,ANDENTROPYRATES

Animportantpropertyofaninformationsourceisits

entropyratehµ,whichindicatesthedegreeofintrinsicrandomnessandcontrolstheachievablecompression.Aﬁrstattemptatestimatingasource’sentropyratemightconsistofpluggingaMarkovchain’sestimatedmodelpa-rametersintotheknownexpression[17].However,thisdoesnotaccuratelyreﬂecttheposteriordistributionde-rivedabove.Thisobservationleavestworealisticalter-natives.Theﬁrstoptionistosamplemodelparametersfromtheposteriordistribution.Thesesamplescanthenbeusedtocalculateasetofentropyrateestimatesthatreﬂecttheunderlyingposteriordistribution.Asecondoption,whichwetakehere,istoadaptmethodsfromtypetheoryandstatisticalmechanicspreviouslydevel-opedforIIDmodels[19]toMarkovchains.TothebestofourknowledgethisistheﬁrsttimetheseideashavebeenextendedtoinferringMarkovchains;althoughcf.[20].

Insimpleterms,typetheoryshowsthattheprobabil-ityofanobservedsequencecanbewrittenintermsoftheKullback-Leibler(KL)distanceandtheentropyrate.WhenappliedtotheMarkovchaininferenceproblemtheresultingformsuggestsaconnectiontostatisticalme-chanics.Forexample,wewillshowthataveragesoftheKL-distanceandentropyratewithrespecttotheposte-riorarefoundbytakingsimplederivativesofapartitionfunction.

Theconnectionbetweeninferenceandinformationthe-orystartsbyconsideringtheproductofthepriorEq.(7)andlikelihoodEq.(6):

P(θk|Mk)P(D|θk,Mk)=P(D,θk|Mk).

(26)

Thisformsajointdistributionovertheobserveddata

DandmodelparametersθkgiventhemodelorderMk.DenotingthenormalizationconstantfromthepriorasZ

tosavespace,thisjointdistributionis

P(D,θk|Mk)=Z󰀃p(s|←−sk)n(←−sks)+α(←−sk

s)−1.(27)

←−sk,s

Thisformcanbewritten,withoutapproximation,in

termsofconditionalrelativeentropiesD[·󰀔·]andentropyratehµ[·]:

P(D,θk|Mk)=Z2−βk(D[Q󰀏P]+hµ[Q])

(28)×2+|A|

k+1

(D[U󰀏P]+hµ[U])

whereβk=󰀇←−sk,s

󰀑n(←−sks)+α(←−sk

tionoftrueparametersisP={p(←s−)s󰀁andthek

),p(s|←−distribu-sk)}.ThedistributionsQandUaregivenby

Q=󰀄q(←−sk

n(←−sk)+α(←−sk)U=󰀄n(←−sk)+α(←−sk)

󰀆

u(←−sk)=1|A|

󰀆

(30)

whereQisthedistributiondeﬁnedbytheposterior

meanandUisauniformdistribution.Theinformation-theoreticquantitiesusedabovearegivenby

D[Q󰀔P]=

󰀂

q(←−sk)q(s|←−sk

)logq(s|←−sk)2

s,←−sk

∂

log2

∂2

log2

,r(s|←−sk

α(←−sks)αk

log2

󰀂

q(←−sk)ψ(0)βkq(←

−sk)←−sk

−

󰀑󰀁klog2

q(←−s)2ψ(1)βkq(←

−sk)sk

󰀑󰀁+

󰀂

←−2β|A|k(|A|−1)+O(1/β2

(41)

kFromthisweseethattheﬁrsttwotermsentropyratehµ[Q]=H[q(←−sk)q(s|←−makesk)]−H[q(←−upthe

sk)]and

thelasttermisassociatedwiththeconditionalrelativeentropybetweentheposteriormeandistributionQandtruedistributionP.

Insummary,wehavefoundtheaverageofconditionalrelativeentropyandentropyratewithrespecttothepos-teriordensity.Thiswasaccomplishedbymakingconnec-tionstostatisticalmechanicsthroughtypetheory.Unlikesamplingfromtheposteriortoestimatetheentropyrate,thismethodresultsinananalyticformwhichapproaches

hµ[P]astheinverseofthedatasize.Thismethodforapproximatinghµalsoprovidesacomputationalbeneﬁt.NoeigenstateshavetobefoundfromtheMarkovtran-sitionmatrix,allowingforthestorageofvaluesinsparsedatastructures.Thisprovidesadistinctcomputationaladvantagewhenlargeordersoralphabetsareconsidered.Finally,itmightseemawkwardtousetheexpecta-tionofEq.(33)forestimationoftheentropyrate.Thismethodwaschosenbecauseitistheformthatnaturallyappearsinwritingdownthelikelihood-priorcombinationinEq.(28).Asaresultofusingthismethod,mostoftheresultsobtainedabovearewithoutapproximation.Wewerealsoabletoshowthisexpectationconvergestothedesiredvalueinawellbehavedmanor.

EXAMPLES

Toexplorehowtheaboveproducesarobustinference

procedure,let’snowconsiderthestatisticalinferenceofaseriesofincreasinglycomplexdatasources.Theﬁrst,calledthegoldenmeanprocess,isaﬁrst-orderMarkovchain.TheseconddatasourceiscalledtheevenprocessandcannotberepresentedbyaMarkovchainwithﬁ-niteorder.However,thissourceisadeterministicHMM,meaningthatthecurrentstateandnextoutputsymboluniquelydeterminethenextstate.Finally,weconsiderthesimplenondeterministicsource,sonamedsinceitssmallestrepresentationisasanondeterministicHMM.(NondeterminismherereferstotheHMMstructure:thecurrentstateandnextoutputsymboldonotuniquelydeterminethenextstate.Thissourceisrepresentedbyaninﬁnite-statedeterministicHMM[22,23].)

Thegoldenmean,even,andsimplenondeterministicprocessescanallbewrittendownasmodelswithtwoin-ternalstates—callthemAandB.However,thecomplex-ityofthedatageneratedfromeachsourceisofmarkedlydiﬀerentcharacter.Ourgoalinthissectionistocon-siderthethreemainstepsininferencetoanalyzethem.First,weconsiderinferenceofaﬁrst-orderMarkovchaintodemonstratetheestimationofmodelparameterswithuncertainty.Second,weconsidermodelcomparisonforarangeofordersk.Thisallowsustodiscoverstructureinthedatasourceeventhoughthetruemodelclasscannotbecapturedinallcases.Finally,weconsiderestimationofentropyratesfromthesedatasources,investigatinghowrandomnessisexpressedinthem.

Whileinvestigatingtheseprocessesweconsideraver-agedatacounts,ratherthansamplecountsfromspeciﬁcrealizations,aswewanttofocusspeciﬁcallyontheav-erageperformanceofBayesianinference.Todothiswetakeadvantageoftheknownformofthesources.EachisdescribedbyatransitionmatrixT,whichgivestran-sitionsbetweenstatesAandB:

󰀎

p(A|A)p(B|A)

p(A|B)p(B|B)󰀐.(42)AlthoughtwoofourdatasourcesarenotﬁniteMarkovchains,thetransitionmatrixbetweeninternalstatesis

Markov.Thismeansthematrixisstochastic(allrowssumtoone)andweareguaranteedaneigenstate󰀫πwitheigenvalueone:󰀫πT=󰀫π.Thiseigenstatede-scribestheasymptoticdistributionoverinternalstates:󰀫π=[p(A),p(B)].

Thetransitionmatrixcanbedividedintolabeledma-tricesT(s)whichcontainthoseelementsofTthatoutputsymbols.Forourbinarydatasourcesonehas

T=T(0)+T(1).

(43)

Usingthesematrices,theaverageprobabilityofwordscanbeestimatedforeachprocessofinterest.Forexam-ple,theprobabilityofword01canbefoundusing

p(01)=󰀫πT(0)T(1)󰀫η,

(44)

where󰀫ηisacolumnvectorwithall1’s.Inthisway,foranydatasizeN,weestimatetheaveragecountforawordas

n(←−sks)=(N−k)p(←−sks).

(45)

Averagecounts,obtainedthisway,willbethebasisfor

alloftheexamplestofollow.

Intheestimationofthetrueentropyratefortheex-amplesweusetheformula

hµ=−

󰀂

p(v)󰀂p(s|v)log2p(s|v)(46)v∈{A,B}

s∈A

forthethegoldenmeanandevenprocesses,where

p(s|v)=Tv(s·)

istheprobabilityofalettersgiventhestatevandp(v)istheasymptoticprobabilityofthestatevwhichcanbefoundasnotedabove.Forthesimplenondeterministicsourcethisclosed-formexpressioncan-notbeappliedandtheentropyratemustbefoundusingmoreinvolvedmethods;see[22]forfurtherdetails.

Goldenmeanprocess:In-classmodeling

Thegoldenmeanprocesscanberepresentedbyasim-ple1st-orderMarkovchainoverabinaryalphabetchar-acterizedbyasingle(shortest)forbiddenwords2=00.Thedeﬁninglabeledtransitionmatricesforthisdatasourcearegivenby

T(0)=

󰀎

01/2

00󰀐,T(1)=󰀎1/2010󰀐.(47)Figure1providesagraphicalrepresentationofthecor-respondinghiddenMarkovchain.Inspectionrevealsa

simplerelationbetweentheinternalstatesAandBandtheoutputsymbols0and1.Anobservationof0indi-catesatransitiontointernalstateBanda1correspondstostateA,makingthisprocessaMarkovchainover0sand1s.

Forthegoldenmeantheeigenstateis󰀫π=[p(A),p(B)]=(2/3,1/3).Withthisvectorandthela-beledtransitionmatricesanydesiredwordcountcanbefoundasdiscussedabove.

1|1/21|1AB0|1/2FIG.1:AdeterministichiddenMarkovchainforthegoldenmeanprocess.Edgesarelabeledwiththeoutputsymbolandthetransitionprobability:symbol|probability.

1.EstimationofM1Parameters

TodemonstratetheeﬀectiveinferenceoftheMarkovchainparametersforthegoldenmeanprocessweconsideraveragecountsforavarietyofdatasizesN.Foreachsize,themarginalposteriorfortheparametersp(0|1)andp(1|0)isplottedinFig.2.Theresultsdemonstratethattheshapeoftheposterioreﬀectivelydescribesthedistri-butionofpossiblemodelparametersateachNandcon-vergestothecorrectvaluesofp(0|1)=1/2andp(1|0)=1withincreasingdata.

Pointestimateswithavariancecanbeprovidedforeachoftheparameters,butthesenumbersbythemselvescanbemisleading.However,theestimateobtainedbyusingthemeanandvarianceoftheposteriorareamoreeﬀectivedescriptionoftheinferenceprocessthanamax-imumlikelihoodestimatewithestimatederrorgivenbyaGaussianapproximationofthelikelihoodalone.AsFig.2demonstrates,infact,aGaussianapproximationofun-certaintyisanineﬀectivedescriptionofourknowledgewhentheMarkovchainparametersareneartheirupperorlowerlimitsat0and1.Probablythemosteﬀectivesetofnumberstoprovideconsistsofthemeanoftheposte-riorandaregionofconﬁdence.Thesewouldmostaccu-ratelydescribeasymmetriesintheuncertaintyofmodelparameters.Althoughwewillnotdothathere,abriefdescriptionofﬁndingregionsofconﬁdenceisprovidedinApp.A1.

SelectingtheModelOrderk

Nowconsidertheselectionoftheappropriateorderkfromgoldenmeanrealizations.Asdiscussedabove,thegoldenmeanprocessisaﬁrstorderMarkovchainwithk=1.Asaresult,wewouldexpectmodelcomparisontoselectthisorderfromtheavailablepossibilities.Todemonstratethis,weconsiderordersk=1−4andper-formmodelcomparisonwithauniformprioroverorders(Eq.(22))andwithapenaltyforthenumberofmodelparameters(Eq.(25)).

TheresultsofthemodelcomparisonsaregiveninFig.3.Thetoppanelshowstheprobabilityforeachorderkasafunctionofthesamplesize,usingauniformprior.Forthisprioroverorders,M1isselectedwithanyreasonableamountofdata.However,theredoesseemto

FIG.2:AplotoftheinferenceofM1modelparametersforthegoldenmeanprocess.ForeachdatasamplesizeN,themarginalposteriorisplottedfortheparametersofinterest:p(0|1)inthetoppanelandp(1|0)inthelowerpanel.Thetruevaluesoftheparametersarep(0|1)=1/2andp(1|0)=1.

beapossibilitytoover-ﬁtforsmalldatasizeN≤100.Thebottompanelshowsthemodelprobabilitywithapenaltypriorovermodelorderk.Thisremovestheover-ﬁttingatsmalldatasizesandproducesanoﬀsetwhichmustbeovercomebythedatabeforehigherkisselected.Thisexampleisnotmeanttoargueforthepenaltypriorovermodelorders.Infact,Bayesianmodelcomparisonwithauniformpriordoesaneﬀectivejobusingarela-tivelysmallsamplesize.

EstimationofEntropyRate

Wecanalsodemonstratetheconvergenceoftheaver-ageforE(Q,P)=D[Q󰀔P]+hµ[Q]giveninEq.(38)tothecorrectentropyrateforthegoldenmeanprocess.Wechoosetoshowthisconvergenceforallordersk=1−4discussedintheprevioussection.Thisexercisedemon-stratesthatallordersgreaterthanorequaltok=1eﬀectivelycapturetheentropyrate.However,thecon-vergencetothecorrectvaluesforhigher-orderktakesmoredatabecauseofalargerinitialvalueofD[Q󰀔P].Thislargervalueissimplyduetothelargernumberofparametersforhigher-orderMarkovchains.

InevaluatingthevalueofD[Q󰀔P]+hµ[Q]fordiﬀer-

1)0.75M,MD|k0.5M12MM(M34

P0.2501)0.75M,MD|k0.5M1MM2(M34P0.250

200

400

600

800

1000

N:SizeofDataSample

FIG.3:ModelcomparisonforMarkovchainsoforderk=1−4usingaveragecountsfromthegoldenmeanprocess.SamplesizesfromN=100toN=1,000instepsof∆N=5areusedtogeneratetheseplots.Thetoppaneldisplaysthemodelprobabilitiesusingauniformprioroverordersk.Thebottompaneldisplaystheeﬀectofapenaltyformodelsize.

entsamplelengths,weexpectthatthePMEestimatedQwillconvergetothetruedistributionP.Asaresult,theconditionalrelativeentropyshouldgotozerowithincreasingN.Forthegoldenmeanprocess,theknownvalueoftheentropyrateishµ=2/3bitspersymbol.InspectionofFig.4demonstratestheexpectedconver-genceoftheaveragefromEq.(38)tothetrueentropyrate.

Theresultofourmodelcomparisonfromtheprevioussectioncouldalsobeusedintheestimationoftheentropyrate.AswesawinFig.3,therearerangesofsamplelengthNwheretheprobabilityofordersk=1,2arebothnonzero.Inprinciple,anestimateofhµshouldbemadebyweightingthevaluesobtainedforeachkbythecorrespondingorderprobabilityP(Mk|D,M).AswecanseefromFig.4,theestimatesoftheentropyratefork=1,2arealsoverysimilarinthisrangeofN.Asaresult,thisadditionalstepwouldnothavealargeeﬀectforentropyrateestimation.

B.Evenprocess:Out-of-classmodeling

Wenowconsideramorediﬃcultdatasourcecalledtheevenprocess.Thedeﬁninglabeledtransitionmatricesare

)lMo1M1mbM2ySM34

/st0.9iB(])P0.8,Q(E[E0.7

1000

2000

3000

4000

5000

N:SizeofDataSample

FIG.4:TheconvergenceofEpost[E(Q,P)]tothetrueen-tropyratehµ=2/3bitspersymbol(indicatedbythegrayhorizontalline)forthethegoldenmeanprocess.AsdemonstratedinEq.(41),theconditionalrelativeentropyD[Q󰀈P]→0as1/N.Thisresultsintheconvergenceofhµ[Q]tothetrueentropyrate.

givenby

(0)

󰀎

1/2000

󰀐

(1)

󰀎

01/210

󰀐

.(48)

AscanbeseeninFig.5,thenode-edgestructureisidenticaltothegoldenmeanprocessbuttheoutputsym-bolsontheedgeshavebeenchangedslightly.Asaresultofthisshuﬄe,thestatesAandBcannolongerbeasso-ciatedwithasimplesequenceof0’sand1’s.WhereasthegoldenmeanhastheirreduciblesetofforbiddenwordsF={00},+1theevenprocesshasacountablyinﬁnitesetF={012n0:n=0,1,2,...}[22].

0|1/21|1AB1|1/2FIG.5:DeterministichiddenMarkovchainrepresentationoftheevenprocess.Thisprocesscannotberepresentedasaﬁnite-order(nonhidden)Markovchainovertheoutputsymbols0sand1s.ThesetofirreducibleforbiddenwordsF={012n+10:n=0,1,2,...}reﬂectsthefactthatthepro-cessgeneratesblocksof1’s,boundedby0s,thatareeveninlength,atanylength.

Insimpleterms,theevenprocessproducesblocksof1’swhichareeveninlength.Thisisamuchmorecom-plicatedtypeofmemorythanwesawinthegoldenmeanprocess.FortheMarkovchainmodelclass,whereawordoflengthkisusedtopredictthenextletter,thiswouldrequireaninﬁnite-orderk.Itwouldbenecessarytokeep

trackofallevenandoddstringsof1’s,irrespectiveofthelength.Asaresult,thepropertiesoftheevenpro-cessmeanthataﬁniteMarkovchaincannotrepresentthisdatasource.

Thisexampleisthenademonstrationofwhatcanbelearnedinacaseofout-of-classmodeling.Weareinter-ested,therefore,inhowwellMarkovchainsapproximatetheevenprocess.Weexpectthatmodelcomparisonwillselectlargerkasthesizeofthedatasampleincreases.Doesthemodelselectiontellsusanythingabouttheun-derlyingdatasourcedespitetheinabilitytoexactlycap-tureitsproperties?Aswewillsee,wedoobtainin-triguinghintsofthetruenatureoftheevenprocessfrommodelcomparison.Finally,canweestimatetheentropyrateoftheprocesswithaMarkovchain?Aswewillsee,ahighkisneededtodothiseﬀectively.

1.EstimationofM1Parameters

InthissectionweconsideranM1approximationoftheevenprocess.Weexpecttheresultingmodeltoaccu-ratelycapturelength-2wordprobabilitiesasNincreases.Inthisexample,weconsiderthetruemodeltobethebestapproximationpossiblebyak=1Markovchain.Fromthelabeledtransitionmatricesgivenabovewecancal-culatetheappropriatevaluesforp(0|1)andp(1|0)usingthemethodsdescribedabove.Startingfromtheasymp-toticdistribution󰀫π=[p(A),p(B)]=[2/3,1/3]weobtainp(0|1)=p(10)/p(1)=1/4andp(1|0)=p(01)/p(0)=1/2.

AswecanseefromFig.6,aﬁrst-orderMarkovchaincanbeinferredwithoutdiﬃculty.Thevaluesobtainedareexactlyasexpected.However,thesevaluesdonottellusmuchaboutthenatureofthedatasourcebythemselves.Thispointstotheimportantroleofmodelcomparisonandentropyrateestimationinunderstand-ingthisdata.

2.SelectingtheModelOrderk

NowconsidertheselectionofMarkovchainorderk=1−4forarangeofdatasizesN.Recallthattheevenprocesscannotberepresentedbyaﬁnite-orderMarkovchainovertheoutputsymbols0and1.Asaconsequence,weexpecthigherktobeselectedwithincreasingdataN,asmoredatastatisticallyjustiﬁesmorecomplexmodels.Thisiswhathappens,infact,butthewayinwhichor-dersareselectedasweincreaseNprovidesstructuralinformationwecouldnotobtainfromtheinferenceofaMarkovchainofﬁxedorder.

IfweconsiderFig.7,aninterestingpatternbecomesapparent.Orderswithevenkarepreferredoverodd.Inthiswaymodelselectionishintingattheunderlyingstructureofthesource.TheMarkovchainmodelclasscannotrepresenttheevenprocessinacompactway,but

FIG.6:AplotoftheinferenceofM1modelparametersfortheevenprocess.ForavarietyofsamplesizesN,themarginalposteriorforp(0|1)(toppanel)andp(1|0)(bottompanel)areshown.Thetruevaluesoftheparametersarep(0|1)=1/4andp(1|0)=1/2.

inferenceandmodelcomparisoncombinedprovideusefulinformationaboutthehiddenstructureofthesource.Inthisexamplewealsohaveregionswheretheproba-bilityofmultipleorderskareequallyprobable.Thesam-plesizeatwhichthisoccursdependsontheprioroverorderswhichisemployed.Whenthishappens,proper-tiesestimatedfromtheMarkovchainmodelclassshoulduseaweightedsumofthevariousorders.Aswewillseeintheestimationofentropyrates,thisisnotascritical.Atsamplesizeswheretheorderprobabilitiesaresimilar,theestimatedentropyratesarealsosimilar.

3.EstimationofEntropyRate

Entropyrateestimationfortheevenprocessturnsouttobeamorediﬃculttaskthanonemightexpect.InFig.8weseethatMarkovchainsoforders1−6areun-abletoeﬀectivelycapturethetrueentropyrate.Infact,experienceshowsthatanorderk=10Markovchainorhigherisneededtogetclosetothetruevalueofhµ=2/3bitspersymbol.Notealsothefactorof20longerreal-izationsthatarerequiredcompared,say,tothegolden

1)0.75M,MD|0.5M1kMM23(M4

P0.2501)0.75M,MD|k0.5M1MM2(M34P0.250

200

400

600

800

1000

N:SizeofDataSample

FIG.7:ModelcomparisonforMarkovchainsoforderk=1−4foraveragedatafromtheevenprocess.Thetoppanelshowsthemodelcomparisonwithauniformprioroverthepossibleordersk.Thebottompaneldemonstratesmodelcomparisonwithapenaltyforthenumberofmodelparameters.Inbothcasesthek=4modelischosenoverlowerordersastheamountofdataavailableincreases.

meanexample.

Asdiscussedabove,aweightedsumofEpost[D[Q󰀔P]+hµ[Q]]couldbeemployedinthisexample.Fortheesti-matethisisnotcriticalbecausethediﬀerentorderspro-videroughlythesamevalueatthesepoints.Infact,thesepointscorrespondtowheretheestimatesofE(Q,P)crossinFig.8.Theyaresamplessizeswhereapparentrandomnesscanbeexplainedbystructureandincreasedorderk.

SimpleNondeterministicSource:Out-of-class

modeling

Thesimplenondeterministicsourceaddsanotherlevelofchallengetoinference.Asitsnamesuggests,itisde-scribedbyanondeterministicHMM.ConsideringFig.9wecanseethata1isproducedoneverytransitionex-ceptfortheB→Aedge.Thismeanstherearemanypathsthroughtheinternalstatesthatproducethesameobservablesequenceof0sand1s.Thedeﬁninglabeledtransitionmatricesforthisprocessaregivenby

T(0)

󰀎

001/20

󰀐,T(1)=󰀎1/21/201/2󰀐.(49)11

)lMoM1mb1M2ySM3/sM4tiB0.9M56

(])P,0.8Q(E[E0.7

5·103

1·104

1.5·104

2·104

N:SizeofDataSample

FIG.8:TheconvergenceofEpost[D[Q󰀈P]+hµ[Q]]tothetrueentropyratehµ=2/3bitspersymbolforthetheevenprocess.Thetruevalueisindicatedbythehorizontalgrayline.Experienceshowsthatak=10Markovchainisneededtoeﬀectivelyapproximatethetruevalueofhµ.

(1)Usingthestate-to-statetransitionmatrixT=T(0)+T,weﬁndtheasymptoticdistributionforthehiddenstatestobe󰀫π=[p(A),p(B)]=[1/2,1/2].Eachofthehiddenstatesisequallylikely;however,a1isalwaysproducedfromstateA,whilethereisanequalchanceofobtaininga0or1fromstateB.

1|1/20|1/21|1/2AB1|1/2FIG.9:AhiddenMarkovchainrepresentationofthesimplenondeterministicprocess.Thisexamplealsocannotberep-resentedasaﬁnite-orderMarkovchainoveroutputs0and1.It,however,ismorecomplicatedthanthetwopreviousexamples:Onlytheobservationofa0providestheobserverwithinformationregardingtheinternalstateoftheunderly-ingprocess;observinga1leavestheinternalstateambiguous.

1.EstimationofM1Parameters

Usingtheasymptoticdistributionderivedabove,the

parametersofaninferredﬁrst-orderMarkovchainshouldapproachp(0|1)=p(10)/p(1)=1/3andp(1|0)=p(01)/p(0)=1.AswecanseefromFig.10,theinferenceprocesscapturesthesevaluesveryeﬀectivelydespitetheout-of-classdatasource.

FIG.10:MarginaldensityforM1modelparametersforthesimplenondeterministicprocess:ThecurvesforeachdatasizeNdemonstrateawellbehavedconvergencetothecorrectvalues:p(0|1)=1/3andp(1|0)=1.

2.SelectingtheModelOrderk

HereweconsiderthecomparisonofMarkovchainmod-elsofordersk=1−4whenappliedtodatafromthesim-plenondeterministicsource.Aswiththeevenprocess,weexpectincreasingordertobeselectedastheamountofavailabledataincreases.InFig.11weseethatthisisexactlywhathappens.

Unliketheevenprocess,thereisnopreferenceforevenorders.Instead,weobserveasystematicincreaseinorderwithlargerdatasets.Wedonotethattheamountofdataneedtoselectahigherorderdoesseemtobelargerthanfortheevenprocess.Herethedistributionoverwordsismoreimportantandmoresubtlethanthesupportofthedistribution(thosewordswithpositiveprobability).

3.EstimationofEntropyRate

Estimationoftheentropyrateforthesimplenonde-terministicsourceprovidesaninterestingcontrasttothepreviousexamples.Asdiscussedwhenintroducingtheexamples,thisdatasourceisanondeterministicHMMandtheentropyratecannotbedirectlycalculatedus-

FIG.11:ModelcomparisonforMarkovchainsoforderk=1−4fordatafromthesimplenondeterministicprocess.Thetoppanelshowsthemodelcomparisonwithauniformprioroverthepossibleordersk.Thebottompaneldemonstratesmodelcomparisonwithapenaltyforthenumberofmodelparameters.Notethescaleonthehorizontalaxis—ittakesmuchmoredataforthemodelcomparisontopickouthigherordersforthisprocesscomparedtothepreviousexamples.

ingEq.(46)[24].However,avalueofhµ≈0.677867bitspersymbolhasbeenobtainedin[22].

Figure12showstheresultsofentropy-rateestimationusingMarkovchainsoforderk=1−6.Theseresultsdemonstratethattheentropyratecanbeeﬀectivelyes-timatedwithlow-orderkandrelativelysmalldatasam-ples.Thisisaninterestingresult,aswemightexpectestimationoftheentropyratetobemostdiﬃcultinthisexample.Insteadweﬁndthattheevenprocesswasamorediﬃculttestcase.

VI.

DISCUSSION

Theexamplespresentedaboveprovideseveralinterest-inglessonsininference,modelcomparison,andestimat-ingrandomness.Thecombinationofthesethreeideasappliedtoadatasourceprovidesinformationandintu-itionaboutthestructureoftheunderlyingsystem,evenwhenmodelingout-of-classprocesses.

IntheexamplesofM1estimatesforeachofthesources

)lMo1M1mbM2ySM3/sM4t0.9iBM56

(])P0.8,Q(E[E0.7

5·103

1·104

1.5·104

2·104N:SizeofDataSample

FIG.12:TheconvergenceofEpost[D[Q󰀈P]+hµ[Q]]tothetrueentropyratehµ≈0.677867bitspersymbolforthesimplenondeterministicsource.Thetruevalueisindicatedbythegrayhorizontalline.

weseethattheBayesianmethodsprovideapowerfulandconsistentdescriptionofMarkovchainmodelpa-rameters.Themarginaldensityaccuratelydescribestheuncertaintyassociatedwiththeseestimates,reﬂectingasymmetrieswhichpointestimationwitherrorbarscan-notcapture.Inaddition,methodsdescribedinApp.A1canbeusedtogenerateregionsofconﬁdenceofanytype.AlthoughtheestimatesobtainedfortheMarkovchainmodelparameterswereconsistentwiththedatasourceforwordsuptolengthk+1,theydidnotcapturethetruenatureofthesystemunderconsideration.Thisdemonstratesthatestimationofmodelparameterswith-outsomekindofmodelcomparisoncanbeverymislead-ing.Onlywiththecomparisonofdiﬀerentordersdidsomeindicationofthetruepropertiesofthedatasourcebecomeclear.Withoutthisstep,misguidedinterpreta-tionsareeasilyobtained.

Forthegoldenmeanprocess,ak=1Markovchain,theresultsofmodelcomparisonwerepredictablyuninter-esting.Thisisagoodindicationthatthecorrectmodelclassisbeingemployed.However,withtheevenprocessamuchmorecomplicatedmodelcomparisonwasfound.Inthiscase,aselectionofevenkoveroddhintedatthedistinguishingpropertiesofthesource.Inasimilarway,theresultsofmodelcomparisonforthesimplenondeter-ministicsourceselectedincreasingorderwithlargerN.Inbothout-of-classmodelingexamples,theincreaseinselectedorderwithoutendisagoodindicationthatthedatasourceisnotintheMarkovchainclass.(Aparalleltechniqueisfoundinhierarchicalǫ-machinereconstruc-tion[22].)Alternatively,thereisanindicationthatveryhigh-orderdependenciesareimportantinthedescriptionoftheprocess.Eitherway,thisinformationisimportantsinceitgivesanindicationtothemodelerthatamorecomplicateddynamicisatworkandallresultsmustbetreatedwithcaution.

Finally,weconsideredtheestimationofentropyratesfortheexampledatasources.Intwoofthecases,the

goldenmeanprocessandthesimplenondeterministicsource,shortdatastreamswereadequate.Thisisnotunexpectedforthegoldenmean,butforthesimplenon-deterministicsourcethismightbeconsideredsurprising.Fortheevenprocess,theestimationoftheentropyratewasmarkedlymorediﬃcult.Forthisdatasource,thecountablyinﬁnitenumberofforbiddenwordsmakesthesupportoftheworddistributionatagivenlengthimpor-tant.Asaresult,alargeramountofdataandahigher-orderMarkovchainareneededtoﬁndadecentestimateofrandomnessfromthatdatasource.Inthisway,eachofthestepsinBayesianinferenceallowonetoseparatestructurefromrandomness.

VII.CONCLUSION

WeconsideredBayesianinferenceofk-thorderMarkovchainmodels.Thisincludedestimatingmodelparame-tersforagivenk,modelcomparisonbetweenorders,andestimationofrandomnessintheformofentropyrates.Inmostapproachestoinference,thesethreeaspectsaretreatedasseparate,butrelatedendeavors.However,weﬁndthemtobeintimatelyrelated.Anestimateofmodelparameterswithoutasenseofwhetherthecorrectmodelisbeingusedismisguidedatbest.Modelcomparisonprovidesawindowintothisproblembycomparingvari-ousorderskwithinthemodelclass.Finally,estimatingrandomnessintheformofanentropyrateprovidesmoreinformationaboutthetrade-oﬀbetweenstructureandrandomness.Todothiswedevelopedaconnectiontothestatisticalmechanicalpartitionfunction,fromwhichaveragesandvariancesweredirectlycalculable.Fortheevenprocess,structurewasperceivedasrandomnessandforthesimplenondeterministicsourcerandomnesswaseasilyestimatedandstructurewasmorediﬃculttoﬁnd.Theseinsights,despitetheout-of-classdata,demonstratethepowerofcombiningthesethreemethodsintooneef-fectivetoolforinvestigatingstructureandrandomnessinﬁnitestringsofdiscretedata.

Acknowledgments

ThisworkwaspartiallysupportedattheCenterforComputationalScienceandEngineeringattheUniver-sityofCaliforniaatDavisbyIntelCorporation.WorkattheSantaFeInstitutewassupportedunderitsComputa-tion,Dynamics,andInferenceProgramcoregrantsfromtheNationalScienceandMacArthurFoundations.C.S.andA.H.acknowledgesupportbytheNationalScienceFoundationGrantDMS03-25939ITR.

APPENDIXA1.

DirichletDistribution

WesupplyabriefoverviewoftheDirichletdistributionforcompleteness.Formoreinformation,areferencesuchas[25]shouldbeconsulted.Insimpleterms,theDirich-letdistributionisthemultinomialgeneralizationoftheBetadistribution.TheprobabilitydensityfunctionforqelementsisgivenbyDir({pi})=

Γ(α)

dencefromamarginaldensity.Themarginalisobtainedfromtheposteriorbyintegratingoutthedependenceonallparametersexceptfortheparameterofinterest.ForaDirichletdistribution,themarginaldensityisknowntobeaBetadistribution[25],

Γ(α)

Beta(pi)=

Var[pj]=

,α

αj(α−αj)

,j=l.

(A2)

α2(1+α)

(A4)

Marginaldistributions

Animportantpartofunderstandinguncertaintyintheinferenceprocessistheabilitytoﬁndregionsofconﬁ-

(1973).

[27]K.MajumderandG.Bhattacharjee,Appl.Stat.22,409

(1973).

[28]G.Cran,K.Martin,andG.Thomas,Appl.Stat.26,111

(1977).

[29]K.Berry,P.W.Mielke,Jr.,andG.Cran,Appl.Stat.39,

309(1990).

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文