kvec - Perl extension for extraction of bilingual terminology
use kvec;
$file1=shift; $file2=shift;
$a=new kvec($file1,$file2); $a->set_statistic("log likehihood"); $a->p2n3(); $result=$a->calcStat(); print $result;
new
Creates a new instance of kvec object. You can use it in 3 diferent ways:
With 2 parameters (the files you want to open):
$a->new kvec($file1.$file2)
With 1 paramameter, if you want to open a file with the result of a previous calling to kvec
$a->new kvec($file)
With 3 parameters if you want to open 2 files wiht extra parameters
$a->new kvec({pieces => 45, lcutOff => 5, ucutOff => 1000 align => 1}, $file1, $file2);
pieces
- is the number of pieces you want to divide the texts.
lcutOff
- the words with a frequency below this value aren't used .
ucutOff
- the words with a frequency higher than this value aren't used.
align
- you should use this value if the texts you are using are aligned in some sort of way. It does not make sense to use this value and pieces
at the same time.
printToFile
Prints the result to file
$a->printToFile($outfile);
printToString
Returns the result as a string
$result->$a->printToString();
grep
Returns a new object, but only with the result of a list of words
$b=$a->grep($word1,$word2,...,$wordn)
sum
With this method you can ``sum'' the result of 2 kvec objects
$a->new kvec($file1,$file2); $b->new kvec($file3,$file4); $a->sum($b);
set_statistic
Sets the statistic test you want to use. If you don't call this method, the statistic test used by default is the Right Fisher. The other available tests are Log Likehihood, T-score, Dice and Odds ratio.
calcStat
This method calculates the dependence between words using the chosen statistic test and returns the result as a string.
-A -C -X -c -n kvec
Ted Pedersen, <tpederse@d.umn.edu>
Nitin Varma, <varm0003@d.umn.edu>
Bruno Martins, <bsm@natura.di.uminho.pt>
perl.
Fung. P. & K. W. Church. (1994) ``K-vec: A New Approach for Aligning Parallel Texts.'' Proceedings from the 15th International Conference on Computational Linguistics, Kyoto.
Bigram Statistics Package by Ted Pedersen & Satanjeev Banerjee. http://www.d.umn.edu/~tpederse/code.html