NAME

kvec - Perl extension for extraction of bilingual terminology


ABSTRACT


SYNOPSIS

  use kvec;

  $file1=shift;
  $file2=shift;

  $a=new kvec($file1,$file2);
  $a->set_statistic("log likehihood");
  $a->p2n3();
  $result=$a->calcStat();
  print $result;


DESCRIPTION

METHODS

new

Creates a new instance of kvec object. You can use it in 3 diferent ways:

With 2 parameters (the files you want to open):

        $a->new kvec($file1.$file2)

With 1 paramameter, if you want to open a file with the result of a previous calling to kvec

        $a->new kvec($file)

With 3 parameters if you want to open 2 files wiht extra parameters

        $a->new kvec({pieces    => 45,
                      lcutOff => 5,
                      ucutOff => 1000
                      align   => 1}, $file1, $file2);

pieces - is the number of pieces you want to divide the texts.

lcutOff - the words with a frequency below this value aren't used .

ucutOff - the words with a frequency higher than this value aren't used.

align - you should use this value if the texts you are using are aligned in some sort of way. It does not make sense to use this value and pieces at the same time.

printToFile

Prints the result to file

        $a->printToFile($outfile);

printToString

Returns the result as a string

        $result->$a->printToString();

grep

Returns a new object, but only with the result of a list of words

        $b=$a->grep($word1,$word2,...,$wordn)

sum

With this method you can ``sum'' the result of 2 kvec objects

                $a->new kvec($file1,$file2);
                $b->new kvec($file3,$file4);
                $a->sum($b);

set_statistic

Sets the statistic test you want to use. If you don't call this method, the statistic test used by default is the Right Fisher. The other available tests are Log Likehihood, T-score, Dice and Odds ratio.

calcStat

This method calculates the dependence between words using the chosen statistic test and returns the result as a string.


HISTORY

  1. 01
    Original version; created by h2xs 1.21 with options
      -A
            -C
            -X
            -c
            -n
            kvec


AUTHOR

Ted Pedersen, <tpederse@d.umn.edu>

Nitin Varma, <varm0003@d.umn.edu>

Bruno Martins, <bsm@natura.di.uminho.pt>


SEE ALSO

perl.

Fung. P. & K. W. Church. (1994) ``K-vec: A New Approach for Aligning Parallel Texts.'' Proceedings from the 15th International Conference on Computational Linguistics, Kyoto.

Bigram Statistics Package by Ted Pedersen & Satanjeev Banerjee. http://www.d.umn.edu/~tpederse/code.html