~~SLIDESHOW~~ ====== XML-TX : type-based XML Validation ====== ===== Classic XML validation ===== * document is XML valid * format validation * document follows a DTD * Elements and attributes are valid * Grammar is valid * document follows a specified Schema * values of some elements or attributes have a specific type ===but...=== ===== Example: what is wrong? ===== http://natura.di.uminho.pt http://naturaaaa.di.uminho.pt www.di.uminho.pt aaron gato cat 33 Aaron, Aarão (nome próprio) aback gato cat 33 z. abaft adv. à popa, à ré ===== Example (continuation) ===== ===What is 'to be valid?' === * **day** * 1..31 * element **pos** * value in a enumerate set (in file POS) * element **url** * is a url * that url exists * element **translation** * is a portuguese text * spellchecked * element **domain** * text in the language //"xml:lagn"// ===Operational Semantics...=== ===== Types ===== * **url** is a aliveurl * aliveurl * **translation** is a portuguese text * text(PT) * **domain** is a //"xml:lang"// text * text(@xml:lang) * **day** is a 1..31 * [1..31] * **pos** is a enumerate (from file POS) == Elements have types == * types are not sets: have functions * is-valid * fix-it * ... ===== Design goals ===== * Pragmatics * help in marking errors * help in fixing errors * syntax * as simple as possible * as powerfull as possible * semantics * type based * dynamic types * function is-valid * function fix-it * builtin types * user defined types ===== Design goals (2)===== * validity process can see the world * a type of an element may depend on the value of an attribute * function "is-valid" dependent of everything necessary * Partial * partial validation, typing, visiting ===== Module XML::TX ===== use XML::TX; my $types={ sentencePt => text("pt"), sentenceEn => text("en"), domain => sub{text($v{'xml:lang'} || "pt")}, url => "urlActive", }; addType( urlActive => { markit => sub{ $c = markAsErr($c) unless (LWP::Simple::head($c)); toxml()}, } ); markit($filename,$types); ===== ... and also ===== fixit( $filename, $types ); isvalid( value, type ) ===== tx DSL (try to hide details...) ====== == tx tx-file x.xml == url2 url href urlActive pos enumFromFile("POS") orth text("en") translation text("pt") domain text(@xml:lang) fig@url urlActive %% use LWP::Simple; addType( urlActive => { markit => sub{ $c = markAsErr($c) unless (LWP::Simple::head($c)); toxml()}, } ); ===== Available types ===== * email * date * text(L1) * enumFromFile(t,F) * enum(day ,[1..31]) * fromRegExp(type, regexp ) addType( typename => { markit => sub {...}, fixit => .... }, ) ===== User defined types ===== addType( url => { markit => sub{ $c = markAsErr($c) unless $c =~ m{^(http|file)://}; toxml()}, fixit => sub{ $c = "http://" . $c if $c =~ /^www\./; $c = markAsErr($c) unless $c =~ m{^(http|file)://}; toxml()}, }, ); ===== Type date ===== use Date::Manip; addType( date => { markit => sub{ $c = markAsErr($c) unless .... toxml()}, fixit => sub{ my $aux = ParseDate($c); if ($aux){ $c = pp($aux); } else { $c = markAsErr($c); } toxml()}}, ); ===== Fixit ===== tx -correct x.tx y.xml > output * simple corrections for specific situations * color ==> colour * correct common mistakes * interactive corrections * validator EPR ===== Validator EPR ===== extract + process + rebuild Correcção=ext_proc_rec(CorrectorInteractivo,....) XML-DT based validators final pos-processor facet-oriente processor ===== Micro Demo =====