package XML::DT::Sequence; use XML::DT; use 5.006; use strict; use warnings; use base 'Exporter'; our @EXPORT = qw($u $c %v $q &father &gfather &ggfather &root); =encoding UTF-8 =head1 NAME XML::DT::Sequence - Down Translator (XML::DT) for sequence XMLs =head1 SYNOPSIS A lot of XML files nowadays are just catalogues, simple sequences of small chunks, that repeat, and repeat. These files can get enormous, and DOM processing hard. SAX processing it interesting but not always the best approach. This module chunks the XML file in Header, a sequence of the repeating blocks, and a footer, and each one of these chunks can be processed by DOM, using L technology. use XML::DT::Sequence; my $dt = XML::DT::Sequence->new(); $dt->process("file.xml", -tag => 'item', -head => sub { my ($self, $xml) = @_; # do something with $xml }, -body => { item => sub { # XML::DT like handler } }, -foot => sub { my ($self, $xml) = @_; # do something with $xml }, ); =head1 EXPLANATION There are four options, only two mandatory: C<-tag> and C<-body>. C<-tag> is the element name that repeats in the XML file, and that you want to process one at a time. C<-body> is the handler to process each one of these elements. C<-head> is the handler to process the XML that appears before the first instance of the repeating element, and C<-foot> the handler to process the XML that apperas after the last instance of the repeating element. Each one of these handlers can be a code reference that receives the C object and the XML string, or a hash reference, with L handlers to process each XML snippet. Note that when processing header or footer, XML is incomplete, and the parser can recover in weird ways. The C method returns a hash reference with three keys: C<-head> is the return value of the C<-head> handler, and C<-foot> is the return value of the C<-foot> handler. C<-body> is the number of elements of the sequence that were processed. =head1 METHODS =head2 new Constructor. =head2 process Processor. Se explanation above. =head2 break Forces the process to finish. Useful when you processed enough number of elements. Note that if you break the process the C<-foot> code will not be run. If you are using a code reference as a handler, call it from the first argument (reference to the object). If you are using a C handler, C<< $u >> has the object, so just call C on it. =cut sub new { my ($class) = @_; return bless { } => $class; } sub break { my $self = shift; $self->{BREAK} = 1; } sub process { my ($self, $file, %ops) = @_; die "Option -tag is mantatory." unless exists $ops{-tag}; local $/ = ""; # XXX - fixme... utf8? open my $fh, "<:utf8", $file or die "Can't open file $file for reading [$!]"; my $firstChunk = <$fh>; die "No $/ tag found. Bailing out." unless $firstChunk =~ $/; my $head = $firstChunk; $head =~ s/<$ops{-tag}.*//s; ## Process header if there is such a handler my $headReturn = undef; if (exists($ops{-head})) { my $headCode = $ops{-head}; if (ref($headCode) eq "CODE") { $headReturn = $headCode->($self, $head); } elsif (ref($headCode) eq "HASH") { $headReturn = dtstring($head, -recover => 1, -userdata => $self, %$headCode); } else { die "No idea what to do with -head of type ".ref($ops{-head}); } } ## process the sequence my $chunk = $firstChunk; my $totalElems = 0; my $bodyCode = $ops{-body} || undef; my $code; if (!$bodyCode) { $code = sub { }; } elsif (ref($bodyCode) eq "CODE") { $code = sub { $bodyCode->($self, $_[0]) }; } elsif (ref($bodyCode) eq "HASH") { $code = sub { dtstring($_[0], -userdata=> $self, %$bodyCode) } } else { die "No idea what to do with -body of type ".ref($ops{-body}); } do { ++$totalElems; $chunk =~ s/^.*(?=<$ops{-tag})//s; $code->($chunk); $chunk = <$fh>; } while ($chunk =~ m{} and !$self->{BREAK}); my $footReturn; if (!$self->{BREAK}) { if (exists($ops{-foot})) { my $footCode = $ops{-foot}; if (ref($footCode) eq "CODE") { $footReturn = $footCode->($self, $chunk); } elsif (ref($footCode) eq "HASH") { $chunk =~ s{^\s*}{}g; $footReturn = dtstring($chunk, -userdata => $self, -recover => 1, %$footCode); } else { die "No idea what to do with -foot of type ".ref($ops{-foot}); } } } close $fh; return { -head => $headReturn, -body => $totalElems, -foot => $footReturn, }; } =head1 AUTHOR Alberto Simões, C<< >> =head1 BUGS Please report any bugs or feature requests to C, or through the web interface at L. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. =head1 SUPPORT You can find documentation for this module with the perldoc command. perldoc XML::DT::Sequence You can also look for information at: =over 4 =item * RT: CPAN's request tracker (report bugs here) L =item * AnnoCPAN: Annotated CPAN documentation L =item * CPAN Ratings L =item * Search CPAN L =back =head1 KNOWN BUGS AND LIMITATIONS =over 4 =item * Spaced tags It is not usual, but XML allows the usage of spaces inside element tags, for instance, between the C<< < >> and the element name. This is B supported. =item * Multiple usage tags If the same tag is used in different levels of the XML hierarchy, it is likely that the implemented algorithm will not work. =back =head1 ACKNOWLEDGEMENTS =head1 LICENSE AND COPYRIGHT Copyright 2012 Alberto Simões. This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License. See http://dev.perl.org/licenses/ for more information. =cut 1; # End of XML::DT::Sequence