arxiv.org/xxxxx/0703XXX
InferringMarkovChains:BayesianEstimation,
ModelComparison,EntropyRate,andOut-of-classModeling
ChristopherC.Strelioff,1,2,∗JamesP.Crutchfield,1,†andAlfredW.H¨ubler2,‡
CenterforComputationalScience&EngineeringandPhysicsDepartment,
UniversityofCaliforniaatDavis,OneShieldsAvenue,Davis,CA95616
2
CenterforComplexSystemsResearchandPhysicsDepartment,
UniversityofIllinoisatUrbana-Champaign,1110WestGreenStreet,Urbana,Illinois61801
1
arXiv:math/0703715v1 [math.ST] 24 Mar 2007Markovchainsareanaturalandwellunderstoodtoolfordescribingone-dimensionalpatternsintimeorspace.Weshowhowtoinferk-thorderMarkovchains,forarbitraryk,fromfinitedatabyapplyingBayesianmethodstobothparameterestimationandmodel-orderselection.Extendingexistingresultsformultinomialmodelsofdiscretedata,weconnectinferencetostatisticalmechanicsthroughinformation-theoretic(typetheory)techniques.WeestablishadirectrelationshipbetweenBayesianevidenceandthepartitionfunctionwhichallowsforstraightforwardcalculationoftheexpectationandvarianceoftheconditionalrelativeentropyandthesourceentropyrate.Finally,weintroduceanovelmethodthatusesfinitedata-sizescalingwithmodel-ordercomparisontoinferthestructureofout-of-classprocesses.
PACSnumbers:02.50.Tt,02.50.Ga,05.10.Gg
I.INTRODUCTION
Statisticalinferenceofmodelsfromsmalldatasamplesisavitaltoolintheunderstandingofnaturalsystems.Inmanyproblemsofinterestdataconsistsofasequenceoflettersfromafinitealphabet.Examplesincludeanalysisofsequenceinformationinbiopolymers[1,2],investiga-tionofone-dimensionalspinsystems[3],modelsofnat-urallanguages[4],andcoarse-grainedmodelsofchaoticdynamics[5,6].Thisdiversityofpotentialapplicationhasresultedinthedevelopmentofavarietyofrepresen-tationsfordescribingdiscrete-valueddataseries.
Weconsiderthek-thorderMarkovchainmodelclasswhichusesthepreviousklettersinasequencetopre-dictthenextletter.InferenceofMarkovchainsfromdatahasalonghistoryinmathematicalstatistics.Earlyworkfocusedonmaximumlikelihoodmethodsforesti-matingtheparametersoftheMarkovchain[7,8,9].Thisworkoftenassumedagivenfixedmodelorder.Thatis,nomodelcomparisonacrossordersisdone.Thisworkalsotypicallyreliedontheassumedasymptoticnormal-ityofthelikelihoodwhenestimatingregionsofconfi-denceandwhenimplementingmodelcomparison.Asaresult,therealmofapplicationhasbeenlimitedtodatasourceswheretheseconditionsaremet.Oneconsequenceoftheseassumptionshasbeenthatdatasourceswhichexhibitforbiddenwords,symbolsequenceswhicharenotallowed,cannotbeanalyzedwiththesemethods.Thistypeofdataviolatestheassumednormalityofthelike-lihoodfunction.
Morerecently,modelcomparisoninthemaximumlike-lihoodapproachhasbeenextendedusingvariousinfor-
2
understandingofinferencenotavailableintheclassicalstatisticsapproachtotheseproblems.Aswedemon-strate,eachofthesestepsisvitalandimplementationofonestepwithouttheothersdoesnotprovideacompleteanalysisofthedata-modelconnection.
Moreover,thecombinationofinferenceofmodelpa-rameters,comparisonofperformanceacrossmodelor-ders,andestimationofentropyratesprovidesapowerfultoolforunderstandingMarkovchainmodelsthemselves.Remarkably,thisistrueevenwhenthegeneratingdatasourceisoutsideoftheMarkovchainmodelclass.Modelcomparisonprovidesasenseofthestructureofthedatasource,whereasestimatesoftheentropyrateprovideadescriptionoftheinherentrandomness.Bayesianinfer-ence,informationtheory,andtoolsfromstatisticalme-chanicspresentedheretouchonalloftheseissueswithinaunifiedframework.
Wedevelopthisasfollows,assumingapassingfamil-iaritywithBayesianmethodsandstatisticalmechanics.First,wediscussestimationofMarkovchainparame-tersusingBayesianmethods,emphasizingtheuseofthecompletemarginalposteriordensityforeachparameter,ratherthanpointestimateswitherrorbars.Second,weconsiderselectionoftheappropriatememorykgivenaparticulardataset,demonstratingthatamixtureofor-dersmayoftenbemoreappropriatethanselectingasin-gleorder.ThisiscertainlyamoregenuinelyBayesianapproach.InthesefirsttwopartsweexploitdifferentformsofBayes’theoremtoconnectdataandmodelclass.Third,weconsiderthemathematicalstructureoftheevidence(ormarginallikelihood)anddrawconnectionstostatisticalmechanics.Inthisdiscussionwepresentamethodforestimatingentropyratesbytakingderivativesofapartitionfunctionformedfromelementsofeachstepoftheinferenceprocedure.Last,weapplythesetoolstothreeexampleinformationsourcesofincreasingcomplex-ity.ThefirstexamplebelongstotheMarkovchainmodelclass,buttheothertwoareexamplesofhiddenMarkovmodels(HMMs)thatfalloutsideofthatclass.Weshowthatthemethodsdevelopedhereprovideapowerfultoolforunderstandingdatafromthesesources,evenwhentheydonotbelongtothemodelclassbeingassumed.
fortheposteriorweobtainBayes’theorem:
P(θ|D,M)=
P(D|θ,M)P(θ|M)
II.INFERRINGMODELPARAMETERS
InthefirstlevelofBayesianinferencewedevelopasystematicrelationbetweenthedataD,thechosenmodelclassM,andthevectorofmodelparametersθ.TheobjectofinterestintheinferenceofmodelparametersistheposteriorprobabilitydensityP(θ|D,M).Thisistheprobabilityofthemodelparametersgiventheobserveddataandchosenmodel.TofindtheposteriorwefirstconsiderthejointdistributionP(θ,D|M)overthedataandmodelparametersgiventhatonehaschosentomodelthesourcewitharepresentationinacertainclassM.Thiscanbefactoredintwoways:P(θ|D,M)P(D|M)orP(D|θ,M)P(θ|M).Settingtheseequalandsolving
B.
Likelihood
GivenasampleofdataD=s0s1...sN−1,thelike-lihoodcanbewrittendownusingtheMarkovproperty
ofEq.(3)andthestationarityofEq.(4).Thisresultsintheform
P(D|θk
,Mk
)=
sp(s|←−sk)n(←−sk
s),(6)s∈A←−k∈Ak
wheren(←−sks)isthenumberoftimestheword←−sksoc-cursinthesampleD.Forfutureusewealsonotationforobservedn(←−thenumbersk)=oftimesaword←−introduce
s∈An(←
−skhasbeensks).WenotethatEq.(6)isconditionedonthestartsequence←−sk=s0s1...sk−1.
C.
Prior
ThepriorP(θ|M)isusedtospecifyassumptionsabout
themodeltobeinferredbeforethedataisconsidered.Hereweuseconjugatepriorsforwhichtheposteriordis-tributionhasthesamefunctionalformastheprior.Ourchoiceallowsustoderiveexactexpressionsformanyquantitiesofinterestininference.Thisprovidesapow-erfultoolforunderstandingwhatinformationisgainedduringinferenceand,especially,modelcomparison.
Theexactformofthepriorisdeterminedbyouras-signmentofhyperparametersα(←−sks)forthepriorwhich
balancethestrengthofthemodelingassumptionsen-codedintheprioragainsttheweightofthedata.Forak-thorderMarkovchain,thereisonehyperparameter
foreachword←−sks,giventhealphabetunderconsider-ation.Ausefulwaytothinkabouttheassignmentof
valuestothehyperparametersistorelatethemcountsn˜(←−sks),suchthatα(←−sks)=n˜(←−tofake
sks)+1.Inthisway,theα(←−sks)canbesettoreflectknowledgeofthe
datasourceandthestrengthofthesepriorassumptionscanbeproperly←weightedinrelationtotheactualdata
countsn(−sks).
TheconjugatepriorforMarkovchaininferenceisproductofDirichletdistributions,oneforeachword←−a
sk.
Itrestatesthefinite-memoryassumptionfromthemodeldefinition:
P(θ)=
Γ(α(←−sk))
k|Mk←−sk∈Ak
3
α(←−sk)
,(8)Varα(←−sks)(α(←−sk)−α(←−sks))
prior[p(s|←−sk)]=
s∈A
Γ(α(←−sks))
(12)
×
s∈A
Γ(n(←−sks)+α(←−sks))
E.
Posterior
UsingBayes’theoremEq.(1)theresultsofthethreeprevioussectionscanbecombinedtoobtaintheposteriordistributionovertheparametersofthek-thorderMarkovchain.Onefinds:
Γ(n(←−sk)+α(←−P(θ)=
sk))
k|D,Mk←−sk∈Ak
n(←−sk)+α(←−sk)
,(14)
Varn(←−sks)+α(←−spost[p(s|←−sk
k)]=
s)(n(←−sk)+α(←−sk)+1)
.
Thisistheposteriormeanestimate(PME)ofthemodel
parameters.
AdeeperunderstandingofEq.(14)isobtainedthroughasimplefactoring:
Epost[p(s|←−sk
)]=1n(←−sk)
+α(←−−(16)
sk
)
α(←sks)4
s∈A
Γ(n(←−sks)+α(←−sks))
(19)
s∈A
Γ(n(←−sks)+m(←−sks)+α(←−sks))
×
III.
MODELCOMPARISON
WiththeabilitytoinferaMarkovchainofagivenor-derk,acommonsensequestionistoaskhowdowechoosethecorrectordergivenaparticulardataset?Bayesianmethodshaveasystematicwaytoaddressthisthroughtheuseofmodelcomparison.
Inmanyways,thisprocessisanalogoustoinferringmodelparametersthemselves,whichwejustlaidout.Westartbyenumeratingthesetofmodelorderstobe
comparedM={Mk}kkmax
min,wherekminandkmaxcor-respondtotheminimumandmaximumordertobein-ferred,respectively.Althoughwewillnotconsideranindependent,identicallydistributed(IID)model(k=0)here,wedonotethatthiscouldbeincludedusingthesametechniquesdescribedbelow.
WestartwiththejointprobabilityP(Mk,D|M)ofaparticularmodelMk∈ManddatasampleD,factoringitintwowaysfollowingBayes’theorem.Solvingfortheprobabilityofaparticularmodelclassweobtain
P(M|Mk|D,M)=
P(Dk,M)P(Mk|M)
P(D|M′k,M)
.(22)
Inthesecondcase,a
M′k∈M
commonpenaltyforthenumberofmodelparametersis
P(MMk|M)=
exp(−|k|)
M′P(D|M′k,M)exp(−|M′k
k|)
.(25)
5
WenotethatthenormalizationsuminEq.(23)cancelsbecauseitappearsinboththenumeratoranddenomina-tor.
BayesianmodelcomparisonhasanaturalOccam’sra-zorinthemodelcomparisonprocess[18].Thismeansthereisanaturalpreferenceforsmallermodelsevenwhenauniformpriorovermodelordersisapplied.Inthislight,apenaltyforthenumberofmodelparameterscanbeseenasaverycautiousformofmodelcomparison.Bothofthesepriors,Eq.(22)andEq.(25),willbeconsideredintheexamplestofollow.
Anoteisinorderoncomputationalimplementation.Ingeneral,theresultingprobabilitiescanbeextremelysmall,easilyresultinginnumericalunderflowiftheequa-tionsarenotimplementedwithcare.Asmentionedin[16],computationwithextendedlogarithmscanbeusedtoalleviatetheseconcerns.
IV.
INFORMATIONTHEORY,STATISTICALMECHANICS,ANDENTROPYRATES
Animportantpropertyofaninformationsourceisits
entropyratehµ,whichindicatesthedegreeofintrinsicrandomnessandcontrolstheachievablecompression.Afirstattemptatestimatingasource’sentropyratemightconsistofpluggingaMarkovchain’sestimatedmodelpa-rametersintotheknownexpression[17].However,thisdoesnotaccuratelyreflecttheposteriordistributionde-rivedabove.Thisobservationleavestworealisticalter-natives.Thefirstoptionistosamplemodelparametersfromtheposteriordistribution.Thesesamplescanthenbeusedtocalculateasetofentropyrateestimatesthatreflecttheunderlyingposteriordistribution.Asecondoption,whichwetakehere,istoadaptmethodsfromtypetheoryandstatisticalmechanicspreviouslydevel-opedforIIDmodels[19]toMarkovchains.TothebestofourknowledgethisisthefirsttimetheseideashavebeenextendedtoinferringMarkovchains;althoughcf.[20].
Insimpleterms,typetheoryshowsthattheprobabil-ityofanobservedsequencecanbewrittenintermsoftheKullback-Leibler(KL)distanceandtheentropyrate.WhenappliedtotheMarkovchaininferenceproblemtheresultingformsuggestsaconnectiontostatisticalme-chanics.Forexample,wewillshowthataveragesoftheKL-distanceandentropyratewithrespecttotheposte-riorarefoundbytakingsimplederivativesofapartitionfunction.
Theconnectionbetweeninferenceandinformationthe-orystartsbyconsideringtheproductofthepriorEq.(7)andlikelihoodEq.(6):
P(θk|Mk)P(D|θk,Mk)=P(D,θk|Mk).
(26)
Thisformsajointdistributionovertheobserveddata
DandmodelparametersθkgiventhemodelorderMk.DenotingthenormalizationconstantfromthepriorasZ
tosavespace,thisjointdistributionis
P(D,θk|Mk)=Zp(s|←−sk)n(←−sks)+α(←−sk
s)−1.(27)
←−sk,s
Thisformcanbewritten,withoutapproximation,in
termsofconditionalrelativeentropiesD[··]andentropyratehµ[·]:
P(D,θk|Mk)=Z2−βk(D[QP]+hµ[Q])
(28)×2+|A|
k+1
(D[UP]+hµ[U])
,
whereβk=←−sk,s
n(←−sks)+α(←−sk
tionoftrueparametersisP={p(←s−)sandthek
),p(s|←−distribu-sk)}.ThedistributionsQandUaregivenby
Q=q(←−sk
)=
n(←−sk)+α(←−sk)U=n(←−sk)+α(←−sk)
u(←−sk)=1|A|
,
(30)
whereQisthedistributiondefinedbytheposterior
meanandUisauniformdistribution.Theinformation-theoreticquantitiesusedabovearegivenby
D[QP]=
q(←−sk)q(s|←−sk
)logq(s|←−sk)2
s,←−sk
∂
log2
∂2
log2
,r(s|←−sk
)=
α(←−sks)αk
6
log2
q(←−sk)ψ(0)βkq(←
−sk)←−sk
−
1
klog2
q(←−s)2ψ(1)βkq(←
−sk)sk
+
1
←−2β|A|k(|A|−1)+O(1/β2
).
(41)
k
kFromthisweseethatthefirsttwotermsentropyratehµ[Q]=H[q(←−sk)q(s|←−makesk)]−H[q(←−upthe
sk)]and
thelasttermisassociatedwiththeconditionalrelativeentropybetweentheposteriormeandistributionQandtruedistributionP.
Insummary,wehavefoundtheaverageofconditionalrelativeentropyandentropyratewithrespecttothepos-teriordensity.Thiswasaccomplishedbymakingconnec-tionstostatisticalmechanicsthroughtypetheory.Unlikesamplingfromtheposteriortoestimatetheentropyrate,thismethodresultsinananalyticformwhichapproaches
hµ[P]astheinverseofthedatasize.Thismethodforapproximatinghµalsoprovidesacomputationalbenefit.NoeigenstateshavetobefoundfromtheMarkovtran-sitionmatrix,allowingforthestorageofvaluesinsparsedatastructures.Thisprovidesadistinctcomputationaladvantagewhenlargeordersoralphabetsareconsidered.Finally,itmightseemawkwardtousetheexpecta-tionofEq.(33)forestimationoftheentropyrate.Thismethodwaschosenbecauseitistheformthatnaturallyappearsinwritingdownthelikelihood-priorcombinationinEq.(28).Asaresultofusingthismethod,mostoftheresultsobtainedabovearewithoutapproximation.Wewerealsoabletoshowthisexpectationconvergestothedesiredvalueinawellbehavedmanor.
V.
EXAMPLES
Toexplorehowtheaboveproducesarobustinference
procedure,let’snowconsiderthestatisticalinferenceofaseriesofincreasinglycomplexdatasources.Thefirst,calledthegoldenmeanprocess,isafirst-orderMarkovchain.TheseconddatasourceiscalledtheevenprocessandcannotberepresentedbyaMarkovchainwithfi-niteorder.However,thissourceisadeterministicHMM,meaningthatthecurrentstateandnextoutputsymboluniquelydeterminethenextstate.Finally,weconsiderthesimplenondeterministicsource,sonamedsinceitssmallestrepresentationisasanondeterministicHMM.(NondeterminismherereferstotheHMMstructure:thecurrentstateandnextoutputsymboldonotuniquelydeterminethenextstate.Thissourceisrepresentedbyaninfinite-statedeterministicHMM[22,23].)
Thegoldenmean,even,andsimplenondeterministicprocessescanallbewrittendownasmodelswithtwoin-ternalstates—callthemAandB.However,thecomplex-ityofthedatageneratedfromeachsourceisofmarkedlydifferentcharacter.Ourgoalinthissectionistocon-siderthethreemainstepsininferencetoanalyzethem.First,weconsiderinferenceofafirst-orderMarkovchaintodemonstratetheestimationofmodelparameterswithuncertainty.Second,weconsidermodelcomparisonforarangeofordersk.Thisallowsustodiscoverstructureinthedatasourceeventhoughthetruemodelclasscannotbecapturedinallcases.Finally,weconsiderestimationofentropyratesfromthesedatasources,investigatinghowrandomnessisexpressedinthem.
Whileinvestigatingtheseprocessesweconsideraver-agedatacounts,ratherthansamplecountsfromspecificrealizations,aswewanttofocusspecificallyontheav-erageperformanceofBayesianinference.Todothiswetakeadvantageoftheknownformofthesources.EachisdescribedbyatransitionmatrixT,whichgivestran-sitionsbetweenstatesAandB:
T=
p(A|A)p(B|A)
p(A|B)p(B|B).(42)AlthoughtwoofourdatasourcesarenotfiniteMarkovchains,thetransitionmatrixbetweeninternalstatesis
7
Markov.Thismeansthematrixisstochastic(allrowssumtoone)andweareguaranteedaneigenstateπwitheigenvalueone:πT=π.Thiseigenstatede-scribestheasymptoticdistributionoverinternalstates:π=[p(A),p(B)].
Thetransitionmatrixcanbedividedintolabeledma-tricesT(s)whichcontainthoseelementsofTthatoutputsymbols.Forourbinarydatasourcesonehas
T=T(0)+T(1).
(43)
Usingthesematrices,theaverageprobabilityofwordscanbeestimatedforeachprocessofinterest.Forexam-ple,theprobabilityofword01canbefoundusing
p(01)=πT(0)T(1)η,
(44)
whereηisacolumnvectorwithall1’s.Inthisway,foranydatasizeN,weestimatetheaveragecountforawordas
n(←−sks)=(N−k)p(←−sks).
(45)
Averagecounts,obtainedthisway,willbethebasisfor
alloftheexamplestofollow.
Intheestimationofthetrueentropyratefortheex-amplesweusetheformula
hµ=−
p(v)p(s|v)log2p(s|v)(46)v∈{A,B}
s∈A
forthethegoldenmeanandevenprocesses,where
p(s|v)=Tv(s·)
istheprobabilityofalettersgiventhestatevandp(v)istheasymptoticprobabilityofthestatevwhichcanbefoundasnotedabove.Forthesimplenondeterministicsourcethisclosed-formexpressioncan-notbeappliedandtheentropyratemustbefoundusingmoreinvolvedmethods;see[22]forfurtherdetails.
A.
Goldenmeanprocess:In-classmodeling
Thegoldenmeanprocesscanberepresentedbyasim-ple1st-orderMarkovchainoverabinaryalphabetchar-acterizedbyasingle(shortest)forbiddenwords2=00.Thedefininglabeledtransitionmatricesforthisdatasourcearegivenby
T(0)=
01/2
00,T(1)=1/2010.(47)Figure1providesagraphicalrepresentationofthecor-respondinghiddenMarkovchain.Inspectionrevealsa
simplerelationbetweentheinternalstatesAandBandtheoutputsymbols0and1.Anobservationof0indi-catesatransitiontointernalstateBanda1correspondstostateA,makingthisprocessaMarkovchainover0sand1s.
Forthegoldenmeantheeigenstateisπ=[p(A),p(B)]=(2/3,1/3).Withthisvectorandthela-beledtransitionmatricesanydesiredwordcountcanbefoundasdiscussedabove.
1|1/21|1AB0|1/2FIG.1:AdeterministichiddenMarkovchainforthegoldenmeanprocess.Edgesarelabeledwiththeoutputsymbolandthetransitionprobability:symbol|probability.
1.EstimationofM1Parameters
TodemonstratetheeffectiveinferenceoftheMarkovchainparametersforthegoldenmeanprocessweconsideraveragecountsforavarietyofdatasizesN.Foreachsize,themarginalposteriorfortheparametersp(0|1)andp(1|0)isplottedinFig.2.Theresultsdemonstratethattheshapeoftheposterioreffectivelydescribesthedistri-butionofpossiblemodelparametersateachNandcon-vergestothecorrectvaluesofp(0|1)=1/2andp(1|0)=1withincreasingdata.
Pointestimateswithavariancecanbeprovidedforeachoftheparameters,butthesenumbersbythemselvescanbemisleading.However,theestimateobtainedbyusingthemeanandvarianceoftheposteriorareamoreeffectivedescriptionoftheinferenceprocessthanamax-imumlikelihoodestimatewithestimatederrorgivenbyaGaussianapproximationofthelikelihoodalone.AsFig.2demonstrates,infact,aGaussianapproximationofun-certaintyisanineffectivedescriptionofourknowledgewhentheMarkovchainparametersareneartheirupperorlowerlimitsat0and1.Probablythemosteffectivesetofnumberstoprovideconsistsofthemeanoftheposte-riorandaregionofconfidence.Thesewouldmostaccu-ratelydescribeasymmetriesintheuncertaintyofmodelparameters.Althoughwewillnotdothathere,abriefdescriptionoffindingregionsofconfidenceisprovidedinApp.A1.
2.
SelectingtheModelOrderk
Nowconsidertheselectionoftheappropriateorderkfromgoldenmeanrealizations.Asdiscussedabove,thegoldenmeanprocessisafirstorderMarkovchainwithk=1.Asaresult,wewouldexpectmodelcomparisontoselectthisorderfromtheavailablepossibilities.Todemonstratethis,weconsiderordersk=1−4andper-formmodelcomparisonwithauniformprioroverorders(Eq.(22))andwithapenaltyforthenumberofmodelparameters(Eq.(25)).
TheresultsofthemodelcomparisonsaregiveninFig.3.Thetoppanelshowstheprobabilityforeachorderkasafunctionofthesamplesize,usingauniformprior.Forthisprioroverorders,M1isselectedwithanyreasonableamountofdata.However,theredoesseemto
8
FIG.2:AplotoftheinferenceofM1modelparametersforthegoldenmeanprocess.ForeachdatasamplesizeN,themarginalposteriorisplottedfortheparametersofinterest:p(0|1)inthetoppanelandp(1|0)inthelowerpanel.Thetruevaluesoftheparametersarep(0|1)=1/2andp(1|0)=1.
beapossibilitytoover-fitforsmalldatasizeN≤100.Thebottompanelshowsthemodelprobabilitywithapenaltypriorovermodelorderk.Thisremovestheover-fittingatsmalldatasizesandproducesanoffsetwhichmustbeovercomebythedatabeforehigherkisselected.Thisexampleisnotmeanttoargueforthepenaltypriorovermodelorders.Infact,Bayesianmodelcomparisonwithauniformpriordoesaneffectivejobusingarela-tivelysmallsamplesize.
3.
EstimationofEntropyRate
Wecanalsodemonstratetheconvergenceoftheaver-ageforE(Q,P)=D[QP]+hµ[Q]giveninEq.(38)tothecorrectentropyrateforthegoldenmeanprocess.Wechoosetoshowthisconvergenceforallordersk=1−4discussedintheprevioussection.Thisexercisedemon-stratesthatallordersgreaterthanorequaltok=1effectivelycapturetheentropyrate.However,thecon-vergencetothecorrectvaluesforhigher-orderktakesmoredatabecauseofalargerinitialvalueofD[QP].Thislargervalueissimplyduetothelargernumberofparametersforhigher-orderMarkovchains.
InevaluatingthevalueofD[QP]+hµ[Q]fordiffer-
1)0.75M,MD|k0.5M12MM(M34
P0.2501)0.75M,MD|k0.5M1MM2(M34P0.250
200
400
600
800
1000
N:SizeofDataSample
FIG.3:ModelcomparisonforMarkovchainsoforderk=1−4usingaveragecountsfromthegoldenmeanprocess.SamplesizesfromN=100toN=1,000instepsof∆N=5areusedtogeneratetheseplots.Thetoppaneldisplaysthemodelprobabilitiesusingauniformprioroverordersk.Thebottompaneldisplaystheeffectofapenaltyformodelsize.
entsamplelengths,weexpectthatthePMEestimatedQwillconvergetothetruedistributionP.Asaresult,theconditionalrelativeentropyshouldgotozerowithincreasingN.Forthegoldenmeanprocess,theknownvalueoftheentropyrateishµ=2/3bitspersymbol.InspectionofFig.4demonstratestheexpectedconver-genceoftheaveragefromEq.(38)tothetrueentropyrate.
Theresultofourmodelcomparisonfromtheprevioussectioncouldalsobeusedintheestimationoftheentropyrate.AswesawinFig.3,therearerangesofsamplelengthNwheretheprobabilityofordersk=1,2arebothnonzero.Inprinciple,anestimateofhµshouldbemadebyweightingthevaluesobtainedforeachkbythecorrespondingorderprobabilityP(Mk|D,M).AswecanseefromFig.4,theestimatesoftheentropyratefork=1,2arealsoverysimilarinthisrangeofN.Asaresult,thisadditionalstepwouldnothavealargeeffectforentropyrateestimation.
B.Evenprocess:Out-of-classmodeling
Wenowconsideramoredifficultdatasourcecalledtheevenprocess.Thedefininglabeledtransitionmatricesare
9
)lMo1M1mbM2ySM34
/st0.9iB(])P0.8,Q(E[E0.7
0
1000
2000
3000
4000
5000
N:SizeofDataSample
FIG.4:TheconvergenceofEpost[E(Q,P)]tothetrueen-tropyratehµ=2/3bitspersymbol(indicatedbythegrayhorizontalline)forthethegoldenmeanprocess.AsdemonstratedinEq.(41),theconditionalrelativeentropyD[QP]→0as1/N.Thisresultsintheconvergenceofhµ[Q]tothetrueentropyrate.
givenby
T
(0)
=
1/2000
,T
(1)
=
01/210
.(48)
AscanbeseeninFig.5,thenode-edgestructureisidenticaltothegoldenmeanprocessbuttheoutputsym-bolsontheedgeshavebeenchangedslightly.Asaresultofthisshuffle,thestatesAandBcannolongerbeasso-ciatedwithasimplesequenceof0’sand1’s.WhereasthegoldenmeanhastheirreduciblesetofforbiddenwordsF={00},+1theevenprocesshasacountablyinfinitesetF={012n0:n=0,1,2,...}[22].
0|1/21|1AB1|1/2FIG.5:DeterministichiddenMarkovchainrepresentationoftheevenprocess.Thisprocesscannotberepresentedasafinite-order(nonhidden)Markovchainovertheoutputsymbols0sand1s.ThesetofirreducibleforbiddenwordsF={012n+10:n=0,1,2,...}reflectsthefactthatthepro-cessgeneratesblocksof1’s,boundedby0s,thatareeveninlength,atanylength.
Insimpleterms,theevenprocessproducesblocksof1’swhichareeveninlength.Thisisamuchmorecom-plicatedtypeofmemorythanwesawinthegoldenmeanprocess.FortheMarkovchainmodelclass,whereawordoflengthkisusedtopredictthenextletter,thiswouldrequireaninfinite-orderk.Itwouldbenecessarytokeep
trackofallevenandoddstringsof1’s,irrespectiveofthelength.Asaresult,thepropertiesoftheevenpro-cessmeanthatafiniteMarkovchaincannotrepresentthisdatasource.
Thisexampleisthenademonstrationofwhatcanbelearnedinacaseofout-of-classmodeling.Weareinter-ested,therefore,inhowwellMarkovchainsapproximatetheevenprocess.Weexpectthatmodelcomparisonwillselectlargerkasthesizeofthedatasampleincreases.Doesthemodelselectiontellsusanythingabouttheun-derlyingdatasourcedespitetheinabilitytoexactlycap-tureitsproperties?Aswewillsee,wedoobtainin-triguinghintsofthetruenatureoftheevenprocessfrommodelcomparison.Finally,canweestimatetheentropyrateoftheprocesswithaMarkovchain?Aswewillsee,ahighkisneededtodothiseffectively.
1.EstimationofM1Parameters
InthissectionweconsideranM1approximationoftheevenprocess.Weexpecttheresultingmodeltoaccu-ratelycapturelength-2wordprobabilitiesasNincreases.Inthisexample,weconsiderthetruemodeltobethebestapproximationpossiblebyak=1Markovchain.Fromthelabeledtransitionmatricesgivenabovewecancal-culatetheappropriatevaluesforp(0|1)andp(1|0)usingthemethodsdescribedabove.Startingfromtheasymp-toticdistributionπ=[p(A),p(B)]=[2/3,1/3]weobtainp(0|1)=p(10)/p(1)=1/4andp(1|0)=p(01)/p(0)=1/2.
AswecanseefromFig.6,afirst-orderMarkovchaincanbeinferredwithoutdifficulty.Thevaluesobtainedareexactlyasexpected.However,thesevaluesdonottellusmuchaboutthenatureofthedatasourcebythemselves.Thispointstotheimportantroleofmodelcomparisonandentropyrateestimationinunderstand-ingthisdata.
2.SelectingtheModelOrderk
NowconsidertheselectionofMarkovchainorderk=1−4forarangeofdatasizesN.Recallthattheevenprocesscannotberepresentedbyafinite-orderMarkovchainovertheoutputsymbols0and1.Asaconsequence,weexpecthigherktobeselectedwithincreasingdataN,asmoredatastatisticallyjustifiesmorecomplexmodels.Thisiswhathappens,infact,butthewayinwhichor-dersareselectedasweincreaseNprovidesstructuralinformationwecouldnotobtainfromtheinferenceofaMarkovchainoffixedorder.
IfweconsiderFig.7,aninterestingpatternbecomesapparent.Orderswithevenkarepreferredoverodd.Inthiswaymodelselectionishintingattheunderlyingstructureofthesource.TheMarkovchainmodelclasscannotrepresenttheevenprocessinacompactway,but
10
FIG.6:AplotoftheinferenceofM1modelparametersfortheevenprocess.ForavarietyofsamplesizesN,themarginalposteriorforp(0|1)(toppanel)andp(1|0)(bottompanel)areshown.Thetruevaluesoftheparametersarep(0|1)=1/4andp(1|0)=1/2.
inferenceandmodelcomparisoncombinedprovideusefulinformationaboutthehiddenstructureofthesource.Inthisexamplewealsohaveregionswheretheproba-bilityofmultipleorderskareequallyprobable.Thesam-plesizeatwhichthisoccursdependsontheprioroverorderswhichisemployed.Whenthishappens,proper-tiesestimatedfromtheMarkovchainmodelclassshoulduseaweightedsumofthevariousorders.Aswewillseeintheestimationofentropyrates,thisisnotascritical.Atsamplesizeswheretheorderprobabilitiesaresimilar,theestimatedentropyratesarealsosimilar.
3.EstimationofEntropyRate
Entropyrateestimationfortheevenprocessturnsouttobeamoredifficulttaskthanonemightexpect.InFig.8weseethatMarkovchainsoforders1−6areun-abletoeffectivelycapturethetrueentropyrate.Infact,experienceshowsthatanorderk=10Markovchainorhigherisneededtogetclosetothetruevalueofhµ=2/3bitspersymbol.Notealsothefactorof20longerreal-izationsthatarerequiredcompared,say,tothegolden
1)0.75M,MD|0.5M1kMM23(M4
P0.2501)0.75M,MD|k0.5M1MM2(M34P0.250
200
400
600
800
1000
N:SizeofDataSample
FIG.7:ModelcomparisonforMarkovchainsoforderk=1−4foraveragedatafromtheevenprocess.Thetoppanelshowsthemodelcomparisonwithauniformprioroverthepossibleordersk.Thebottompaneldemonstratesmodelcomparisonwithapenaltyforthenumberofmodelparameters.Inbothcasesthek=4modelischosenoverlowerordersastheamountofdataavailableincreases.
meanexample.
Asdiscussedabove,aweightedsumofEpost[D[QP]+hµ[Q]]couldbeemployedinthisexample.Fortheesti-matethisisnotcriticalbecausethedifferentorderspro-videroughlythesamevalueatthesepoints.Infact,thesepointscorrespondtowheretheestimatesofE(Q,P)crossinFig.8.Theyaresamplessizeswhereapparentrandomnesscanbeexplainedbystructureandincreasedorderk.
C.
SimpleNondeterministicSource:Out-of-class
modeling
Thesimplenondeterministicsourceaddsanotherlevelofchallengetoinference.Asitsnamesuggests,itisde-scribedbyanondeterministicHMM.ConsideringFig.9wecanseethata1isproducedoneverytransitionex-ceptfortheB→Aedge.Thismeanstherearemanypathsthroughtheinternalstatesthatproducethesameobservablesequenceof0sand1s.Thedefininglabeledtransitionmatricesforthisprocessaregivenby
T(0)
=
001/20
,T(1)=1/21/201/2.(49)11
)lMoM1mb1M2ySM3/sM4tiB0.9M56
(])P,0.8Q(E[E0.7
0
5·103
1·104
1.5·104
2·104
N:SizeofDataSample
FIG.8:TheconvergenceofEpost[D[QP]+hµ[Q]]tothetrueentropyratehµ=2/3bitspersymbolforthetheevenprocess.Thetruevalueisindicatedbythehorizontalgrayline.Experienceshowsthatak=10Markovchainisneededtoeffectivelyapproximatethetruevalueofhµ.
(1)Usingthestate-to-statetransitionmatrixT=T(0)+T,wefindtheasymptoticdistributionforthehiddenstatestobeπ=[p(A),p(B)]=[1/2,1/2].Eachofthehiddenstatesisequallylikely;however,a1isalwaysproducedfromstateA,whilethereisanequalchanceofobtaininga0or1fromstateB.
1|1/20|1/21|1/2AB1|1/2FIG.9:AhiddenMarkovchainrepresentationofthesimplenondeterministicprocess.Thisexamplealsocannotberep-resentedasafinite-orderMarkovchainoveroutputs0and1.It,however,ismorecomplicatedthanthetwopreviousexamples:Onlytheobservationofa0providestheobserverwithinformationregardingtheinternalstateoftheunderly-ingprocess;observinga1leavestheinternalstateambiguous.
1.EstimationofM1Parameters
Usingtheasymptoticdistributionderivedabove,the
parametersofaninferredfirst-orderMarkovchainshouldapproachp(0|1)=p(10)/p(1)=1/3andp(1|0)=p(01)/p(0)=1.AswecanseefromFig.10,theinferenceprocesscapturesthesevaluesveryeffectivelydespitetheout-of-classdatasource.
FIG.10:MarginaldensityforM1modelparametersforthesimplenondeterministicprocess:ThecurvesforeachdatasizeNdemonstrateawellbehavedconvergencetothecorrectvalues:p(0|1)=1/3andp(1|0)=1.
2.SelectingtheModelOrderk
HereweconsiderthecomparisonofMarkovchainmod-elsofordersk=1−4whenappliedtodatafromthesim-plenondeterministicsource.Aswiththeevenprocess,weexpectincreasingordertobeselectedastheamountofavailabledataincreases.InFig.11weseethatthisisexactlywhathappens.
Unliketheevenprocess,thereisnopreferenceforevenorders.Instead,weobserveasystematicincreaseinorderwithlargerdatasets.Wedonotethattheamountofdataneedtoselectahigherorderdoesseemtobelargerthanfortheevenprocess.Herethedistributionoverwordsismoreimportantandmoresubtlethanthesupportofthedistribution(thosewordswithpositiveprobability).
3.EstimationofEntropyRate
Estimationoftheentropyrateforthesimplenonde-terministicsourceprovidesaninterestingcontrasttothepreviousexamples.Asdiscussedwhenintroducingtheexamples,thisdatasourceisanondeterministicHMMandtheentropyratecannotbedirectlycalculatedus-
12
FIG.11:ModelcomparisonforMarkovchainsoforderk=1−4fordatafromthesimplenondeterministicprocess.Thetoppanelshowsthemodelcomparisonwithauniformprioroverthepossibleordersk.Thebottompaneldemonstratesmodelcomparisonwithapenaltyforthenumberofmodelparameters.Notethescaleonthehorizontalaxis—ittakesmuchmoredataforthemodelcomparisontopickouthigherordersforthisprocesscomparedtothepreviousexamples.
ingEq.(46)[24].However,avalueofhµ≈0.677867bitspersymbolhasbeenobtainedin[22].
Figure12showstheresultsofentropy-rateestimationusingMarkovchainsoforderk=1−6.Theseresultsdemonstratethattheentropyratecanbeeffectivelyes-timatedwithlow-orderkandrelativelysmalldatasam-ples.Thisisaninterestingresult,aswemightexpectestimationoftheentropyratetobemostdifficultinthisexample.Insteadwefindthattheevenprocesswasamoredifficulttestcase.
VI.
DISCUSSION
Theexamplespresentedaboveprovideseveralinterest-inglessonsininference,modelcomparison,andestimat-ingrandomness.Thecombinationofthesethreeideasappliedtoadatasourceprovidesinformationandintu-itionaboutthestructureoftheunderlyingsystem,evenwhenmodelingout-of-classprocesses.
IntheexamplesofM1estimatesforeachofthesources
)lMo1M1mbM2ySM3/sM4t0.9iBM56
(])P0.8,Q(E[E0.7
0
5·103
1·104
1.5·104
2·104N:SizeofDataSample
FIG.12:TheconvergenceofEpost[D[QP]+hµ[Q]]tothetrueentropyratehµ≈0.677867bitspersymbolforthesimplenondeterministicsource.Thetruevalueisindicatedbythegrayhorizontalline.
weseethattheBayesianmethodsprovideapowerfulandconsistentdescriptionofMarkovchainmodelpa-rameters.Themarginaldensityaccuratelydescribestheuncertaintyassociatedwiththeseestimates,reflectingasymmetrieswhichpointestimationwitherrorbarscan-notcapture.Inaddition,methodsdescribedinApp.A1canbeusedtogenerateregionsofconfidenceofanytype.AlthoughtheestimatesobtainedfortheMarkovchainmodelparameterswereconsistentwiththedatasourceforwordsuptolengthk+1,theydidnotcapturethetruenatureofthesystemunderconsideration.Thisdemonstratesthatestimationofmodelparameterswith-outsomekindofmodelcomparisoncanbeverymislead-ing.Onlywiththecomparisonofdifferentordersdidsomeindicationofthetruepropertiesofthedatasourcebecomeclear.Withoutthisstep,misguidedinterpreta-tionsareeasilyobtained.
Forthegoldenmeanprocess,ak=1Markovchain,theresultsofmodelcomparisonwerepredictablyuninter-esting.Thisisagoodindicationthatthecorrectmodelclassisbeingemployed.However,withtheevenprocessamuchmorecomplicatedmodelcomparisonwasfound.Inthiscase,aselectionofevenkoveroddhintedatthedistinguishingpropertiesofthesource.Inasimilarway,theresultsofmodelcomparisonforthesimplenondeter-ministicsourceselectedincreasingorderwithlargerN.Inbothout-of-classmodelingexamples,theincreaseinselectedorderwithoutendisagoodindicationthatthedatasourceisnotintheMarkovchainclass.(Aparalleltechniqueisfoundinhierarchicalǫ-machinereconstruc-tion[22].)Alternatively,thereisanindicationthatveryhigh-orderdependenciesareimportantinthedescriptionoftheprocess.Eitherway,thisinformationisimportantsinceitgivesanindicationtothemodelerthatamorecomplicateddynamicisatworkandallresultsmustbetreatedwithcaution.
Finally,weconsideredtheestimationofentropyratesfortheexampledatasources.Intwoofthecases,the
13
goldenmeanprocessandthesimplenondeterministicsource,shortdatastreamswereadequate.Thisisnotunexpectedforthegoldenmean,butforthesimplenon-deterministicsourcethismightbeconsideredsurprising.Fortheevenprocess,theestimationoftheentropyratewasmarkedlymoredifficult.Forthisdatasource,thecountablyinfinitenumberofforbiddenwordsmakesthesupportoftheworddistributionatagivenlengthimpor-tant.Asaresult,alargeramountofdataandahigher-orderMarkovchainareneededtofindadecentestimateofrandomnessfromthatdatasource.Inthisway,eachofthestepsinBayesianinferenceallowonetoseparatestructurefromrandomness.
VII.CONCLUSION
WeconsideredBayesianinferenceofk-thorderMarkovchainmodels.Thisincludedestimatingmodelparame-tersforagivenk,modelcomparisonbetweenorders,andestimationofrandomnessintheformofentropyrates.Inmostapproachestoinference,thesethreeaspectsaretreatedasseparate,butrelatedendeavors.However,wefindthemtobeintimatelyrelated.Anestimateofmodelparameterswithoutasenseofwhetherthecorrectmodelisbeingusedismisguidedatbest.Modelcomparisonprovidesawindowintothisproblembycomparingvari-ousorderskwithinthemodelclass.Finally,estimatingrandomnessintheformofanentropyrateprovidesmoreinformationaboutthetrade-offbetweenstructureandrandomness.Todothiswedevelopedaconnectiontothestatisticalmechanicalpartitionfunction,fromwhichaveragesandvariancesweredirectlycalculable.Fortheevenprocess,structurewasperceivedasrandomnessandforthesimplenondeterministicsourcerandomnesswaseasilyestimatedandstructurewasmoredifficulttofind.Theseinsights,despitetheout-of-classdata,demonstratethepowerofcombiningthesethreemethodsintooneef-fectivetoolforinvestigatingstructureandrandomnessinfinitestringsofdiscretedata.
Acknowledgments
ThisworkwaspartiallysupportedattheCenterforComputationalScienceandEngineeringattheUniver-sityofCaliforniaatDavisbyIntelCorporation.WorkattheSantaFeInstitutewassupportedunderitsComputa-tion,Dynamics,andInferenceProgramcoregrantsfromtheNationalScienceandMacArthurFoundations.C.S.andA.H.acknowledgesupportbytheNationalScienceFoundationGrantDMS03-25939ITR.
14
APPENDIXA1.
DirichletDistribution
WesupplyabriefoverviewoftheDirichletdistributionforcompleteness.Formoreinformation,areferencesuchas[25]shouldbeconsulted.Insimpleterms,theDirich-letdistributionisthemultinomialgeneralizationoftheBetadistribution.TheprobabilitydensityfunctionforqelementsisgivenbyDir({pi})=
Γ(α)
dencefromamarginaldensity.Themarginalisobtainedfromtheposteriorbyintegratingoutthedependenceonallparametersexceptfortheparameterofinterest.ForaDirichletdistribution,themarginaldensityisknowntobeaBetadistribution[25],
Γ(α)
Beta(pi)=
Var[pj]=
,α
αj(α−αj)
,j=l.
(A2)
α2(1+α)
2.
(A4)
Marginaldistributions
Animportantpartofunderstandinguncertaintyintheinferenceprocessistheabilitytofindregionsofconfi-
15
(1973).
[27]K.MajumderandG.Bhattacharjee,Appl.Stat.22,409
(1973).
[28]G.Cran,K.Martin,andG.Thomas,Appl.Stat.26,111
(1977).
[29]K.Berry,P.W.Mielke,Jr.,andG.Cran,Appl.Stat.39,
309(1990).
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- efsc.cn 版权所有 赣ICP备2024042792号-1
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务