====== Lingua::PT::ProperNames ====== Lingua::PT::ProperNames - Simple module to extract proper names from Portuguese Text ===== Synopsis ===== This module contains simple Perl-based functions to detect and extract proper names from Portuguese text. use Lingua::PT::ProperNames; printPN(@options); printPNstring({ %options... } ,$textstrint); printPNstring([ @options... ] ,$textstrint); forPN( sub{my ($pn, $contex)=@_;... } ) ; forPN( {t=>"double"}, sub{my ($pn, $contex)=@_;... }, sub{...} ) ; $outstr = forPN($instr, sub{my ($pn, $contex)=@_;... }, ... ) ; forPNstring(sub{my ($pn, $contex)=@_;... }, $textstring, regsep) ; my $pndict = Lingua::PT::ProperNames->new; ===== ProperNames dictionary ===== ==== new ==== Creates a new ProperNames dictionary ==== is_name ==== This method checks if a name exists in the Names dictionary. ==== is_surname ==== This method checks if a name exists in the Names dictionary as a Surname. ===== Exports the following functions ===== ==== forPN ==== Substitutes all propernames from STDIN, by calling a function f($propername,$context), and sends output to STDOUT Usage: forPN({options...}, sub{ propername processor...}) Optionally you can define input or output files: forPN({in=> "inputfile", out => "outputfile" }, sub{...}) Optionally you can use option type : {t = "double"} to have special treatment for process names after punctuation (".", etc). With this options you must provide 2 functions: one for normal propernames and one for names after punctuation. forPN({t=>"double"}, sub{...}, sub{...}) You can also define record paragraph separator forPN({sep=>"\n", t=>"normal"}, sub{...}) ## each line is a par. forPN({sep=>""}, sub{...}) ## par. empty lines ==== forPNstring ==== forPNstring( $funref, "textstring" [, regSeparator] )> Substitutes all propername by funref(propername) in the text string. printPNstring printPNstring("oco") ==== getPN ==== ==== printPN ==== printPN("oco") printPN - extrai os nomes próprios dum texto. -comp junta certos nomes: Fermat + Pierre de Fermat = (Pierre de) Fermat -prof -e "Sebastiao e Silva" "e" como pertencente a PN -em "em Famalicão" como pertencente a PN