Table of Contents
NATools
WARNING
This information is outdated and should be updated soon.
NATools is a workbench for parallel corpora processing. It includes a sentence aligner and a Probabilistic Translation Dictionary extractor, a word aligner and a set of other tools to study the aligned parallel corpora.
- Home Page: http://natools.sf.net
- Using NATools with some PT corpora http://linguateca.di.uminho.pt/nat
History
This set of tools is heavily based on !Twente-Aligner, by Djoerd Hiemstra (hiemstra - cs.utwente.nl) from the University of Twente, The Netherlands.
The Twente-aligner copyright is GPL, and current development maintains the same open-source license.
Current development is taking place on the Computer Science Department, from University of Minho, at Portugal. Main development is being done by Alberto Simões (ambs - di.uminho.pt) and José João Almeida (jj - di.uminho.pt).
Download
From time to time there is an official release. Not always stable, but mainly installable. These can be downloaded from http://natura.di.uminho.pt/download/sources/NATools. Active development is taking place. To download the most recent software use the SVN tree:
svn co https://natura.di.uminho.pt/svn/main/NATools
Installation
If you are installing on Unix, Linux or !MacOS X you should have all this software installed or in the system installation disks. If you are using Windows I strongly advice to install cygwin.
Supposing you are in a Unix-like system or you had installed cygwin, follows the process of installation for NATools. If you find any problem installing or find some bug on this document, please contact me.
Dependencies
NATools has a lot of software dependencies. It is quite hard to maintain this list actual. So, this list is quite certain incomplete.
NAToolss use a set of different Perl modules. You should install them before trying to compile !NATools. Fortunately, Perl has an installation shell which helps you with this task. It should be available on your system as the command line tool cpan or, ir not, running perl -MCPAN -e shell. Run it as superuser (root).
You should be prompted for configuration details in case you are running this command for the first time. Then, you will get a prompt where you can issue installation commands:
cpan> install Lingua::PT::PLNbase
where the module to be installed (in this case, !Lingua::PT::PLNbase) follows the install command.
Proceed with the installation for (at least) the following modules:
- Lingua::PT::PLNbase
- Lingua::Identify
- XML::DT
- XML::TMX
- Compress::Zlib
- Storable
- MLDBM
- Time::HiRes
- Term::ReadLine
- URI::Escape
Configuring the package
Uncompress the package. If you are under Windows, DO NOT use WinZip or WinRar, as they change file contents!. Use gzip and tar as in unix systems:
tar zxvf NATools-x.x.x.tar.gz cd NATools-x.x.x
Then, configure the package running the configure shell script.
./configure
and take some attention to the output produced. It will test for your system configuration as well as modules dependencies. If you have problems installing NATools send me always the configure output.
Compiling
NATools is an hybrid application, written in C for efficiency, and in Perl for flexibility. So, the package must compile the C portions before being installed.
To compile, just issue the make command. If its compilation fails, and you can't figure out what is the problem, contact me. Do not forget to send the configure output as well as the output for the make command.
Testing the package
NATools include a set of tests so you can verify if your compilation went ok. It will align some toy corpora and perform some tests on it. Just enter the make check command.
Installing the package
If everything goes alright, you can now install the !NATools package. You just need to run the make install command as superuser.