User Tools

Site Tools


ferramentas:lingua-pt-propernames

Lingua::PT::ProperNames

Lingua::PT::ProperNames - Simple module to extract proper names from Portuguese Text

Synopsis

This module contains simple Perl-based functions to detect and extract proper names from Portuguese text.

use Lingua::PT::ProperNames;
printPN(@options);
printPNstring({ %options... } ,$textstrint);
printPNstring([ @options... ] ,$textstrint);
forPN( sub{my ($pn, $contex)=@_;... } ) ;
forPN( {t=>"double"},
       sub{my ($pn, $contex)=@_;... }, sub{...} ) ;
$outstr = forPN($instr, sub{my ($pn, $contex)=@_;... }, ... ) ;
forPNstring(sub{my ($pn, $contex)=@_;... },
       $textstring, regsep) ;
my $pndict = Lingua::PT::ProperNames->new;

ProperNames dictionary

new

Creates a new ProperNames dictionary

is_name

This method checks if a name exists in the Names dictionary.

is_surname

This method checks if a name exists in the Names dictionary as a Surname.

Exports the following functions

forPN

Substitutes all propernames from STDIN, by calling a function f($propername,$context), and sends output to STDOUT

Usage:

 forPN({options...}, sub{ propername processor...})

Optionally you can define input or output files:

 forPN({in=> "inputfile", out => "outputfile" }, sub{...})

Optionally you can use option type : {t = “double”} to have special treatment for process names after punctuation (“.”, etc). With this options you must provide 2 functions: one for normal propernames and one for names after punctuation.

 forPN({t=>"double"}, sub{...}, sub{...})

You can also define record paragraph separator

 forPN({sep=>"\n", t=>"normal"}, sub{...}) ## each line is a par.
 forPN({sep=>""}, sub{...})                ## par. empty lines

forPNstring

 forPNstring( $funref, "textstring" [, regSeparator] )>

Substitutes all propername by funref(propername) in the text string. printPNstring

 printPNstring("oco")

getPN

printPN

printPN("oco")
printPN  - extrai os nomes próprios dum texto.
 -comp    junta certos nomes: Fermat + Pierre de Fermat = (Pierre de) Fermat
 -prof
 -e       "Sebastiao e Silva" "e" como pertencente a PN
 -em      "em Famalicão" como pertencente a PN
ferramentas/lingua-pt-propernames.txt · Last modified: 2008/09/08 14:32 by ambs