2011-01-20 Alberto Manuel Brandão Simões * NAT/scripts/nat-dumpDicts (dump_dictionary): add utf8 support. * src/pre.c: handle -u for utf8 aware word truncating. 2011-01-08 Alberto Manuel Brandão Simões * NAT/NAT/PTD.pm (downtr): added verbose option. (add): added verbose option. Fastened on sqlite backend. (verbose): added verbose method. * NAT/NAT/PTD/SQLite.pm: removed downtr method (unified with parent class) 2011-01-06 Alberto Manuel Brandão Simões * NAT/scripts/nat-ptd (lowercase): lowercase method. 2011-01-02 Alberto Manuel Brandão Simões * NAT/NAT/PTD.pm (lowercase): to lowercase function. Needs to be called on nat-ptd lowercase option. * NAT/NAT/PTD/SQLite.pm: same as bellow. * NAT/NAT/PTD.pm (toentry/downtr): now -filter can change the word that is being processed. 2010-12-28 Alberto Manuel Brandão Simões * NAT/NAT.pm.in: support -csize to specify the chunk size. The default value is 70000. === NATools 0.5.10 === 2010-12-27 Alberto Manuel Brandão Simões * NAT/scripts/nat-codify: support -utf8 option. * NAT/NAT.pm.in: support the -utf8 option. * NAT/scripts/nat-create: support the -utf8 option. * configure.ac: bumped version number (0.5.10) * NAT/NAT.pm.in: fixed issues with UTF8. 2010-12-26 Alberto Manuel Brandão Simões * NAT/NAT/PTD.pm (transHash): added transHash method. 2010-12-23 Alberto Manuel Brandão Simões * NAT/scripts/nat-ptd (_get_output_filename): Added support for xz. * NAT/NAT/PTD/XzDmp.pm: Added xz support on PTDs 2010-09-20 Alberto Manuel Brandão Simões * NAT/scripts/nat-codify: support -i option. * NAT/NAT.pm.in: support the -i option. * NAT/scripts/nat-create: support the -i option. * src/pre.c (main): added support to -i option to ignore case. Now, by default, nat-pre doesn't perform lowercasing. * NAT/scripts/nat-ptd (toDmpBz): added method to convert PTD do bzipped dumper; 2010-09-18 Alberto Manuel Brandão Simões * NAT/NAT/PTD/SQLite.pm (_save): add transaction on save method; * NAT/scripts/nat-ptd (toDmp): added method to convert PTD to dumper; (toSQLite): added method to convert PTD to SQLite format; === NATools 0.5.9 === 2010-09-17 Alberto Manuel Brandão Simões * NAT/NAT/PTD/*: modularized PTD support (dmp/bz2/sqlite). * NAT/t/02_ptd.t: rewritten tests to test all kind of backends. 2010-06-26 Alberto Manuel Brandão Simões * NAT/NAT/PTD.pm: added support to load .bz2 PTD files. 2010-06-22 Alberto Manuel Brandão Simões * NAT/scripts/nat-ptd: added nat-ptd to integrate PTD related commands (like nat-PTDfilter, that are now deprecated). 2010-02-10 Alberto Manuel Brandão Simões * NAT/scripts/nat-dumpDicts: removed the -detailed option that wasn't really used or useful; removed the MLDBM output format that wasn't really used or useful and that will be substituted by a SQLite database; Updated documentation. 2010-02-09 Alberto Manuel Brandão Simões * NAT/scripts/nat-makeCWB: added script to export aligned corpus to CWB. 2009-04-01 Alberto Manuel Brandão Simões * NAT/NAT/Matrix.pm - Calculate translation probability for found patterns. * NAT/scripts/nat-exampleExtrator - Show translation probabilities 2009-03-30 Alberto Manuel Brandão Simões * NAT/NAT.pm.in - Fix some issues with ngrams called by nat-create; * NAT/scripts/nat-exampleExtractor - Compute attraction metrics; 2008-11-28 Alberto Manuel Brandão Simões * NAT/NAT/PTD.pm - Added NAT::PTD module * Added support for regexps on Patterns conditions 2008-10-29 Alberto Manuel Brandão Simões * src/srcshared.[ch]: ngrams databases are opened just when needed; * NAT/scripts/*: recoded to UTF-8; 2007-04-11 Alberto Manuel Brandão Simões * src/ngrams.c: Support for different databases by n-gram and -i and -j flag * src/ngramidx.c: Support for different databases by n-gram; * NAT/NAT.pm.in: Support for different databases by n-gram; 2007-02-11 Alberto Manuel Brandão Simões * configure.ac: Version 0.5.5 just to release often 2007-02-07 Alberto Manuel Brandão Simões * NAT/t/14_scripts.t: Added a set of tests for documentation on methods being installed. * src/mat2dic.c: corrected bug that was forgetting the last word from the source language corpus. * NAT/scripts/nat-create: added a bunch of options to choose EM Algorithm mode (-noEM, -ipfp, -samplea and -sampleb) 2007-02-01 Alberto Manuel Brandão Simões * NAT/NAT/PatternRules.yp: support "|" on patterns 2007-01-31 Alberto Manuel Brandão Simões * NAT/NAT/PatternRules.yp: changed grammar of patterns to support predicates over pattern fields, and to support a section at the end of the file with Perl code. 2007-01-27 Alberto Manuel Brandão Simões * src/ngramidx.c: added pragma page_size; added stat as file verification method instead of buggy fopen; 2007-01-25 Alberto Manuel Brandão Simões * src/srvshared.c: protect ngrams query if ngrams are not calculated. 2007-01-09 Alberto Manuel Brandão Simões * configure.ac: install Perl scripts together with the C ones. Perl libraries go to the default dir (normally /usr/lib/perl). Use --with-perl-prefix to force another directory. 2007-01-02 Alberto Manuel Brandão Simões * src/ngrams.c (main): fixed bug added... last year. 2006-12-29 Alberto Manuel Brandão Simões * NAT/scripts/nat-mkMakefile: create ngram-databases for different ns. Needs conciliation later; * src/ngrams.c (main): Support option for just bigrams, trigrams or tetragrams; * src/ngramidx.c: Support databases with just bigrams, trigrams or tetragrams; === NATools 0.5.4 === 2006-11-22 Alberto Manuel Brandão Simões * NAT/MANIFEST: Added missing files * src/pre.c (AddSentence): removed code for ngrams * configure.ac: bumped version to 0.5.4 2006-11-20 Alberto Manuel Brandão Simões * NAT/Makefile.PL: Removed nat-this and nat-these from distribution; added nat-codify to distribution; 2006-11-17 Alberto Manuel Brandão Simões * NAT/scripts/nat-create: add -ngrams flag to force ngrams * NAT/NAT.pm.in (index_ngrams): added method to calculate ngrams * src/ngrams.c (main): some formating issues. Actualize verbose result at the end. * ChangeLog: reformated Rúben entry. * NAT/NAT/CGI.pm, NAT/cgis/nat-search.cgi, NAT/cgis/nat-dict.cgi: changed some design issues. 2006-11-15 Ruben Fonseca * src/ngram.[ch]: Implemented the dump of bigrams, trigrams and tetragrams from a CRP to a SQLite db. It uses a hash table to hold a cache and optimize the process. This functionality can now be removed from teh nat-create pipeline. * src/Makefile.am: Added the creation of a nat-ngram binary * src/{...}.{ch}: Changed all "sqlite3 *" to "SQLite" struct, that holds the handle to the database and the caches used to optimize the process. === NATools 0.5.3 === 2006-11-14 Alberto Manuel Brandão Simões * configure.ac: bumped version number to 0.5.3. Unstable release! * NAT/NAT/Matrix.pm: bumped version number to 0.04 (findDiagonal): bug connecting patterns; (grep_blocks): fixed way pattern and growing blocks are recognized (setProbabilitiesForEqualStrings): mark equal strings just if they are bigger than two characters; 2006-10-19 Alberto Manuel Brandão Simões * src/dictionary.c (dictionary_add): Use the new formula with different weight based on corpus sizes; * configure.ac: check for log10 in libmath; * NAT/examples/ambs-1: example to add dictionaries. Uses two dictionaries in Data::Dumper format and add them. While this is not a really useful tool it is handy to test add formulas. 2006-10-18 Alberto Manuel Brandão Simões * configure.ac: New detection system for sqlite3 (it has a .pc file, so, use it). * src/dictionary.c (dictionary_add): Corrected the formula used to sum up dictionaries. 2006-10-17 Ruben Fonseca * src/ngramidx.[ch]: implemented a cache based on a hash table, to ease the number of select/updates needed to put all the index data on the sqlite database. The optimum value of the cache size "CACHE_SIZE" is still unkown (time-space problem). 2006-10-13 Alberto Manuel Brandão Simões * NAT/scripts/nat-addDict: Working. Need reformulation, maybe. 2006-10-12 Alberto Manuel Brandão Simões * src/words.[ch]: store the number of tokens in the corpus; 2006-10-11 Alberto Manuel Brandão Simões * src/dictionary.c: Factorize a function to enlarge dictionaries buffers, so it can be reused by the sum tool. * Makefile.am: Support make test as well as make check. * NAT/scripts/nat-PTDfilter (filter): filter translations with numbers or symbols. * NAT/scripts/nat-addDict: preaparing tool to add a dictionary to a NATools corpus; * NAT/scripts/nat-substDict: substitute a dictionary on a NATools corpus; 2006-09-21 Alberto Manuel Brandão Simões * NAT/NAT.xs (corpus_info_ngrams_by_str): reverse the result list; * src/srvshared.c: corrected bug when constructing the result linked list; 2006-09-20 Alberto Manuel Brandão Simões * NAT/NAT.xs (corpus_info_ngrams_by_str): Added perl interface for ngrams directly from NATools library; * src/srvshared.c: Added callback to create list of results; * NAT/NAT/Client.pm (ngrams): Added interface for ngrams; Removed functions that were moved to CGI module; * src/srvshared.c: corrected SQL statement construction; 2006-09-19 Alberto Manuel Brandão Simões * src/srvshared.c: add function to query ngrams. SQL statement constructed. Need to be tested to access later the database; * src/server/server.c (parse): add parse instructions for ngrams queries; * configure.ac: do not check for Fortran and C++ during Libtool configuation. 2006-09-18 Alberto Manuel Brandão Simões * src/ngramidx.[ch]: renamed bigramidx to ngramidx; moved to SQLite; Added trigrams code; * src/pre.c: create bigrams and trigrams occurrences count 2006-09-15 Alberto Manuel Brandão Simões * src/pre.c (AddSentence): keep track of last word id for later use on bigrams; Create bigram indexes databases, and add occurrence information; 2006-09-14 Alberto Manuel Brandão Simões * NAT/NAT/CGI.pm: Created (some months ago, but not added to the ChangeLog) a CGI module for some specific html headers and footers. * NAT/cgis/nat-dict.cgi: fixed the 'compact to compact' links. Moved documentation text to the __DATA__ section. * NAT/cgis/nat-search.cgi: highlight possible translations for simple query. Moved documentation text to the __DATA__ section. Enlarged result set to 500 concordancies. * src/server/server.c: corrected bug concerning checking if an unsigned integer is greater than zero. * src/bigramidx.[ch]: added files to store a bigram index. * src/Makefile.am: add bigramidx module dependency. * configure.ac: check for dbopen. 2006-07-09 Alberto Manuel Brandão Simões * configure.ac: check correctly for perl, pod2man and yapp. * NAT/scripts/nat-create: added -tokenize flag to force corpora tokenization. 2006-07-04 Alberto Manuel Brandão Simões * configure.ac: 0.5.2 * NAT/NAT/Client.pm: added code to search for concordancies on local corpora * src/srvshared.h: factorized code 2006-05-13 Alberto Manuel Brandão Simões * configure.ac: added some section heading on configure output. 2006-05-12 Alberto Manuel Brandão Simões * THANKS: added THANKS file. * configure.ac: added macro to detect Perl modules. * acinclude.m4: added macro to detect Perl modules. 2005-11-02 Alberto Manuel Brandão Simões * NAT/NAT.pm.in: added language identification; 2005-11-01 Alberto Manuel Brandão Simões * removed dependencies from Term::ANSIColor; * NAT/scripts/nat-these: ask for corpus name when aligning; * src/server/server.c: added multi-corpus support; * src/server/corpusinfo.c: structure to manage corpus information; * NAT/NAT/Client.pm: added support for multi-corpus server; * NAT/cgis/*.cgi: added support for multi-corpus server; * src/pre.c: corrected problem with my_lowercase function on darwin; 2005-10-27 Alberto Manuel Brandão Simões * NAT/NAT/Config.pm: removed dependency on Config::Simple * NAT/cgis/nat-about.cgi: added CGI to show basic corpus information * NAT/cgis/*.cgi: added connection to nat-about.cgi * src/server/server.c: added DEBUG cpp variable for LOG size reduction. 2005-09-13 Alberto Manuel Brandão Simões * NAT/cgis/nat-dict.cgi: old navigation interface is back to work. 2005-07-29 Alberto Manuel Brandão Simões * NAT/NAT.pm.in: Changed configuration format from http to ini. * src/server/server.c: Added code to send configuration variables responses; * src/server/parseini.[ch]: Added code to parse win-style .ini files which will be created by Config::Simple. Still has problems with comments. * NAT/NAT/Client.pm (query_variable): Added function to query server on configuration variables; * NAT/cgis/nat-search.cgi: Added documentation and language pair information; * NAT/cgis/nat-dict.cgi: Added documentation and language pair information; * NAT/cgis/nat-matrix.cgi: Added documentation and language pair information; 2005-07-25 Alberto Manuel Brandão Simões * src/server/server.c: added quality information output, and cache. * NAT/cgis/nat-search.cgi: added quality information on search CGI. * NAT/scripts/nat-rank: adapted to work with new filenames convention. * NAT/NAT.pm.in (rank): adapted rank file to work with new configuration file. * NAT/NAT.xs: croak in case the maximum number of objects opened at the same time gets hit. 2005-07-17 Alberto Manuel Brandão Simões * NAT/scripts/nat-shell: Added stupid POD. Re-usage complete; * NAT/NAT.pm.in: Code re-usage complete. Added documentation as well. 2005-06-13 Alberto Manuel Brandão Simões * NAT/scripts/nat-these: started changes to reuse code; * NAT/NAT.pm (count_sentences): Added progress indicator; * src/pre.c (main): be slightly less verbose. * src/postbin.c (saveDicts): Corrected bug regarding accesses to uninitialized memory areas; 2005-06-08 Alberto Manuel Brandão Simões * configure.ac: Changed version to make pression to make this package again minimally usable. 2005-05-18 Alberto Manuel Brandão Simões * src/server/server.c: Save two offset caches 2005-05-17 Alberto Manuel Brandão Simões * src/server/server.c: Reuse server socket (helps for faster debugging) (parse): Added bi-search for words and sequences. 2005-05-16 Alberto Manuel Brandão Simões * src/server/server.c: Added server for parallel corpora. Supports by word or by sequence search on both languages. 2005-05-06 Alberto Manuel Brandão Simões * src/isolatin.c: Removed some unused code 2005-05-05 Alberto Manuel Brandão Simões * src/invindex.c: use bucket to write file, instead of writing 32bits at a time; Added linked list of buffers instead of linked list of integers. Tried different sizes. Best seems to be about 50. Joining time for Europarl is about 1/6 of the original one. * src/bucket.c: * src/bucket.h: Created module for buffered write on files for integers * src/grep.c: Created code to grep words on the corpus. Added some intersection code; 2005-04-13 Alberto Manuel Brandão Simões * src/invindexjoin.c: * src/invindex.c: * src/invindex.h: Code to manage inversion indexes. 2005-04-06 Alberto Manuel Brandão Simões * NAT/NAT.pm.in (run_post, run_mat2dic, run_generic_EM): Remove temporary matrix files and dic file. 2005-03-24 Alberto Manuel Brandão Simões * configure.ac: require libtool (to create library) * src/Makefile.am (nat_pre_SOURCES): Create a library for words and corpus modules === NATools 0.4.11 === 2005-03-16 Alberto Manuel Brandão Simões * configure.ac: Changed version 2005-03-14 Alberto Manuel Brandão Simões * src/initmat.c (InitialEstimate): Changed a lot of (*s1).word to the s1->word syntax (easier to read and maintain) 2005-01-27 Alberto Manuel Brandão Simões * NAT/scripts/nat-dumpDicts: dump sorted by descending probability 2005-01-21 Alberto Manuel Brandão Simões * NAT/NAT/Corpus.pm: added method to free the corpus; * NAT/NAT.xs: added an interface function to free corpora loaded; * NAT/scripts/nat-word-chunk: changed to support multiple corpus files and reduced the amount of memory needed. 2005-01-17 Alberto Manuel Brandão Simões * NAT/NAT/Corpus.pm: Created a package to create an iterator (NAT::Corpus::Iterator) to simplify corpus usage. 2005-01-16 Alberto Manuel Brandão Simões * NAT/t/11_corpus.t: Added a test file to test the Corpus.pm module. Almost empty at the present moment. * NAT/NAT.xs: Added XS code to interact with the .crp files. * NAT/NAT/Corpus.pm: New module to interact with .crp files; Maybe later we can rewrite PCorpus.pm based on this one, so we don't need to use open2. 2005-01-05 Alberto Manuel Brandão Simões * NAT/NAT.pm.in (nat_quick_open): Corrected open function for aligned corpora given the new align method usage; 2004-12-12 Alberto Manuel Brandão Simões * NAT/NAT.pm.in: Added some documentation about the new align method; === NATools 0.4.10 === 2004-11-29 Alberto Manuel Brandão Simões * configure.ac: Bumbed version * t/nat-these.t: Added tests to words. * t/nat-pre.t: Added tests to nat-pre / words databases. 2004-06-11 Alberto Manuel Brandão Simões * NAT/scripts/nat-this: moved some functions to main module; 2004-05-26 Alberto Manuel Brandão Simões * configure.ac: added gtk-doc stuff. * src/natlexicon.c: added some gtk-doc-like documentation. * src/natdict.c: added some more gtk-doc-like documentation. 2004-05-24 Alberto Manuel Brandão Simões * src/natdict.c: added some gtk-doc-like documentation. 2004-05-23 Alberto Manuel Brandão Simões * INSTALL: Changed basic gnu installation documentation to specific NATools instructions; * NAT/NAT.pm.in, *.pm: added copyright information; bumped version number to 0.05 * src/*.[ch]: added copyright information * configure.ac: trying to use gtk-doc. * src/natlexicon.h: separated lexicon and natdictionary code into two files; * src/mkdict.c: creates one dictionary with lexicon and maps all together. Created one month ago, and iterated. 2003-11-24 Alberto Manuel Brandão Simões * NAT/NAT.pm.in: increment version number; * NAT/NAT/PCorpus.pm: use $NAT::PREFIX for binary prefixing; Increment version number for easier future comparisons 2003-11-21 Alberto Manuel Brandão Simões * src/search_sentence2.c: second version of the original search_sentence.c. This tries to be aware of the possibility of having more than one chunk. * NAT/NAT.pm.in: use an autoconf/automake variable to define commands full path; 2003-11-14 Alberto Manuel Brandão Simões * configure.ac: preparing patch version (no bugs found, yet) 2003-11-13 Alberto Manuel Brandão Simões * NAT/scripts/nat-these: rank and dump dicts files after alignment; * NAT/Makefile.PL: require XML::TMX 0.04 (xml:lang support); * configure.ac: preparing a release 0.4.0; * NAT/scripts/nat-this (tokenize): trying to add a tokenizer; 2003-10-17 Alberto Manuel Brandão Simões * NAT/cgis/nat-search.cgi: added possibility to consult translation qualities directly on the query CGI; 2003-09-22 Alberto Manuel Brandão Simões * src/dictionary.c: corrected bug on dictionary sum (integer overflow) 2003-09-21 Alberto Manuel Brandão Simões * NAT/NAT.pm (nat_open): corrected bug with rank files: they should be in the same number as corpus files; * NAT/NAT/Translator.pm (new): corrected bug with rank files: they should be in the same number as corpus files; 2003-09-20 Alberto Manuel Brandão Simões * NAT/scripts/nat-rank: work with more than one chunk * NAT/NAT.pm (nat_open): create a better object with more information: number of chunks * t/nat-these.t: corrected test so that it works with binary direcitonaries 2003-08-03 Alberto Manuel Brandão Simões * NAT/NAT.pm (nat_quick_open): added a quick open function, which does not charges the lexicon nor the dictionary files; (nat_open): use nat_quick_open to bootstrap, and then open lexicon and dictionary files; * src/dictionary.c (dictionary_sentence_similarity): removed a (yuck) goto! 2003-08-02 Alberto Manuel Brandão Simões * NAT/cgis/nat-main.cgi: preparing a new interface; * NAT/cgis/nat-browse.cgi: nicer interface; * NAT/cgis/nat-search.cgi: nicer interface; 2003-07-31 Alberto Manuel Brandão Simões * NAT/cgis/nat-search.cgi: use new filename specifiers; added option to choose the corpus; * NAT/NAT/PCorpus.pm: work with ids optionally; 2003-07-28 Alberto Manuel Brandão Simões * NAT/cgis/nat-browse.cgi: converted to new dictionaries; 2003-07-21 Alberto Manuel Brandão Simões * NAT/NAT/Lexicon.pm (id_with_word): get id from a word; (sentence_to_ids): convert a sentence directly to ids; (ids_to_sentence): convert a set of ids to a sentence; * NAT/NAT.xs (wlgetbyword): get id from a string; * NAT/NAT.pm (sentence_similarity): first version of this functions using only integers; (check_sentence_similarity): corrected stupid bug, but this version should be soon deprecated; 2003-07-20 Alberto Manuel Brandão Simões * NAT/t/14_scripts.t: added test to validate scripts compilation; * NAT/scripts/nat-dumpDicts: added file to dump dictionaries on perl format; * NAT/MANIFEST: removed files; * NAT/scripts/nat-dumpDB: removed file; * NAT/scripts/nat-createDB: removed file; * NAT/scripts/nat-db2storable: removed file; * NAT/scripts/nat-createStorable: removed file; * NAT/NAT/Lexicon.pm: added file with interface to C code to deal with Lexicon files; * NAT/NAT/Dict.pm: rewritten code to use C interface with new dict format; 2003-07-19 Alberto Manuel Brandão Simões * NAT/NAT.xs: corrected stupid bug; * NAT/scripts/nat-dict: use new dictionary format; * src/dictionary.c: new function to add dictionaries in binary format; * NAT/NAT.pm: added interface for dictionary xs functions; * NAT/NAT.xs: added interface to addition of dictionaries; 2003-07-18 Alberto Manuel Brandão Simões * configure.ac: LIBS and CFLAGS are detected on configure and passed to Makefile.PL * NAT/Makefile.PL: LIBS and CFLAGS are detected on configure and passed to Makefile.PL * NAT/NAT.xs: added functions to open and close a bin dict; added functions to open and close the lexicon binary tree; added function to return a list of (word,val,word,val..) * src/dictionary.c: added function to get occurrences number; 2003-07-17 Alberto Manuel Brandão Simões * src/dictionary.c: new module to handle dictionaries directly in binary format insteaf of DB file; * src/dictionary.h: new module to handle dictionaries directly in binary format instead of DB file; 2003-07-16 Alberto Manuel Brandão Simões * NAT/scripts/nat-translate-shell: some awful changes * NAT/scripts/nat-db2storable: quick hack to convert db files into storable ones; * NAT/scripts/nat-translate-shell: First try to handle more than one corpus. 2003-07-13 Alberto Manuel Brandão Simões * pods/nat-initmat.pod: completed documentation; * NEWS: added news about documentation; * pods/nat-post.pod: completed documentation; * pods/nat-css.pod: completed documentation; * pods/nat-mat2dic.pod: completed documentation; 2003-07-06 Alberto Manuel Brandão Simões * pods/nat-ipfp.pod: completed documentation; * pods/nat-samplea.pod: completed documentation; * pods/nat-sampleb.pod: completed documentation; * pods/Makefile.am: better handle of pod files; * pods/nat-initmat.pod: added file; 2003-06-21 Alberto Manuel Brandão Simões * src/corpus.c (corpus_add_word): store flags giving information about the word case; * src/search_sentence.c: change word case accordingly to the flags; * t/nat-these.t: back to the original form. * src/words.c: store words on original form. 2003-06-20 Alberto Manuel Brandão Simões * src/corpus.c: add functions to deal with capitalization flags; * src/words.c: store lowercase words; * t/nat-these.t: use lowercase for tests (temporary); 2003-06-19 Alberto Manuel Brandão Simões * NEWS: add news for version 0.3 * configure.ac: bump version to 0.3 * NAT/MANIFEST: add t/15_cgis.t * NAT/t/15_cgis.t: Test cgis for correct syntax; * NAT/scripts/nat-these: Add support for auto-split; 2003-06-17 Alberto Manuel Brandão Simões * NAT/NAT.pm: Add support for dbs on OO functions; * NAT/cgis/nat-browse.cgi: use configuration file instead of direct parameters; 2003-05-13 Alberto Manuel Brandão Simões * NAT/NAT/Dict.pm (add): look to the percentage before print it to remove flickering of the terminal; 2003-05-12 Alberto Manuel Brandão Simões * NAT/NAT/Dict.pm (add): corrected stupid bug from MLDBM! 2003-04-20 Alberto Manuel Brandão Simões * NAT/NAT/Dict.pm (add): two hours, and found why the add function didn't work. SOLVED!!! :-/ 2003-04-18 Alberto Manuel Brandão Simões * NAT/cgis/nat-main.cgi: some more CSS, to make it nicer; * NAT/Makefile.PL: Term::ANSIColor is required! * NAT/NAT/Dict.pm (add): function to add dictionaries; * t/nat-these.t: added test file for nat-these process. At the moment, tests more the created files than the files content 2003-04-15 Alberto Manuel Brandão Simões * NAT/scripts/nat-these: create ranking automatically; create configuration file to be read by other tools; * NAT/scripts/nat-translate-shell: Shell now supports commands; some colors to make the day nicer; 2003-04-12 Alberto Manuel Brandão Simões * configure.ac: version bumped to 0.2.1; 2003-04-11 Alberto Manuel Brandão Simões * src/search_sentence.c: corrected bug: first line of the corpus was not printed with the 'all' option; -q works in non-interactive mode, for 'all' option; * NAT/NAT.pm (new): constructor of NAT objects; * NAT/scripts/nat-pair2tmx: convert nat corpora to tmx, including quality rank. 2003-04-10 Alberto Manuel Brandão Simões * NAT/NAT/Translator.pm (translate): more black magic to handle smallest keyword on translate function. 2003-04-08 Alberto Manuel Brandão Simões * NAT/NAT/Translator.pm (translate): functional programming black magic to handle threshold and best keywords on translate function. Not tested! * NAT/NAT/PCorpus.pm: handle quality value returned by nat-css; * NAT/NAT/Translator.pm (translate): function receives named parameters at the end. One of them (sample_size) is used to define how much samples the translate function will receive. 2003-04-03 Alberto Manuel Brandão Simões * NAT/NAT.pm (check_bidirectional_sentence_similarity): changed rule to compute the bidirectional sentence similarity value; (rank): added ranking function; * NAT/scripts/nat-these: create db and storable files * src/search_sentence.c: added option to dump all the corpus * NAT/NAT/PCorpus.pm (size): function to check the size of the corpus lookint to the offset file; * src/search_sentence.c: support to query a specific sentence, given the correct identifier; 2003-03-31 Alberto Manuel Brandão Simões * NAT/NAT.pm (check_sentence_similarity): receive the sentence as an array. (check2): prepare sentences splitting them; (check_bidirectional_sentence_similarity): changed name from check2; Added documentation. VERSION bumped to 0.02. 2003-03-18 Alberto Manuel Brandão Simões * NAT/scripts/nat-translate-shell: time translations and dictionary load; * NAT/NAT/Dict.pm (translations): if the word is not in the dictionary, do not crash: try to continue. 2003-03-14 Alberto Manuel Brandão Simões * NAT/scripts/nat-browse (entry): corrected bugs; * NAT/NAT/Translator.pm (new): support for DB files or Storable; * NAT/scripts/nat-browse: added support for DB files or Storable files; * NAT/scripts/nat-createStorable: added file to create a storable instead of a DB file; * NAT/Makefile.PL: added installation for nat-createStorable; * NAT/NAT.pm (merge_dict_lex): added support for storable; 2003-03-13 Alberto Manuel Brandão Simões * NAT/scripts/nat-browse: navigation in more than one corpus; * src/sent_align.c (main): put delimiter in the correct place; 2003-03-12 Alberto Manuel Brandão Simões * NAT/NAT/Translator/method2.pm (create_window_str): use reference instead of value. * NAT/NAT/Translator/method1.pm (get_best_window): use reference instead of array. Count the size of the string only one time. * NAT/NAT/Translator.pm (translate): return the number of samples found on the corpora. 2003-03-06 Alberto Manuel Brandão Simões * src/sent_align.c (main): support to single or two output files; * NAT/NAT.pm (check_sentence_similarity, check_pair_sizes): add support for stop word list; 2003-02-26 Alberto Manuel Brandão Simões * src/pre.c (AnalyseCorpus): cosmetic changes on output; (main): support -v option to be verbose and -V to print version; * COPYING: Added information about the vanilla aligner; * AUTHORS: Added information about the vanilla aligner; * configure.ac: added check for two more math library functions; * src/Makefile.am: added vanilla aligner (authorization still pending) 2003-02-25 Alberto Manuel Brandão Simões * NAT/NAT/Translator/method2.pm: method using sliding window and best translation mean; * NAT/NAT/Translator/method1.pm: method using sliding window and translation indication of translation similarity; * NAT/NAT/Translator.pm (translate): Created method to incorporate translation methods; * configure.ac: cosmetic changes; * NAT/NAT/Dict.pm (translations): the quick index is redundant. Removed it and use NAT::search method; 2003-02-23 Alberto Manuel Brandão Simões * NAT/NAT/PCorpus.pm (ready): removed a stupid bug which was removing a translation from the answer for the search; * NAT/NAT/Dict.pm: create a quick index to look-up words in lowercase format. Is consuming a lot of time! Changed lookup functions accordingly. 2003-02-20 Alberto Manuel Brandão Simões * NAT/NAT.pm (mean0): added function to compute mean (with zero check for empty sets...) * NAT/t/00_basic.t: added tests (which revelead useful) to test the mean0 function; 2003-02-14 Alberto Manuel Brandão Simões * NAT/NAT/Dict.pm (translations): Corrected scaring bug >:-> * src/post.c (main): print to stderr detailed information about what we are doing; * src/search_sentence.c: exit the program if it does not find the offset indexes for the corpus files; 2003-02-11 Alberto Manuel Brandão Simões * pods/nat-pre.pod: some more documentation; * NAT/MANIFEST: added Dict.pm file; * NAT/NAT/Dict.pm: Added file to encapsulate NATool dictionaries; 2003-01-25 Alberto Manuel Brandão Simões * configure.ac: Added detection of pod2man command; * pods/nat-pre.pod: Added stub man page; * pods/nat-initmat.pod: Added stub man page; * pods/Makefile.am: Added creation of man pages * src/words.c: Changed 'get_id' function to 'word_list_get_id' to look better with the other word_list_* functions; Changed includ files as well. * src/sampleb.c: Use guint32 instead of unsigned long; * src/samplea.c: Use guint32 instead of unsigned long; * src/dict.c: Use guint32 instead of unsigned long; * NAT/Makefile.PL: Added nat-tmx2pair to install it; install nat-createDB too! * NAT/MANIFEST: Added script and moved script files; * NAT/scripts/nat-tmx2pair: created script to split tmx into files. It is cleaner and quicker than the one in the CQP dir; Support for many files; 2003-01-23 Alberto Manuel Brandão Simões * NAT/Makefile.PL: Added dependency for XML::DT to use with TMXs; * src/matrix.c: Changed floats and unsigned ints to better types; * src/ipfp.c: Changed floats and unsigned ints to better types; added iteraction counter for cases where the em-algorith does not converge; 2003-01-22 Alberto Manuel Brandão Simões * NAT/NAT/PCorpus.pm (ready): Corrected function to handle two languages on css output; (neod): 'ne' (not equal) is faster than pattern matching; * src/search_sentence.c: function to print sentences, and added sentence printing for original corpus; 2003-01-21 Alberto Manuel Brandão Simões * NAT/NAT/PCorpus.pm: added file to encapsulate a parallel corpus object. Includes constructor and search method (with some auxiliary ones); * NAT/NAT.pm (translate): start interface with new css program (CSS stands for Corpus Sentence Search) * NAT/Makefile.PL: prepare dependency on IPC::Open2 2003-01-20 Alberto Manuel Brandão Simões * src/search_sentence.c: removed the function to get the offset and instead, load the file to memory. It is not so big!! Read sentences from the standard input, while not eof. 2003-01-19 Alberto Manuel Brandão Simões * src/corpus.c (corpus_new): corrected bug on index initialization; (corpus_add_word): corrected bug when saving the offset; * src/search_sentence.c: created file with basic funcionality to search corpora words in one language and get the correspondent sentence in the other language. * NAT/NAT.pm (normalize): changed function name to maintain english coherence; 2003-01-16 Alberto Manuel Brandão Simões * NAT/t/00_basic.t: renamed from basic.t to 00_basic.t; added checks for the normalization function; added checks for the search function; * NAT/t/05_pairs.t: added tests for sentence pair size check; * NAT/NAT.pm: added support for bi-directional sentence similarity check; added function to check sentence pair sizes; added a normalization function; 2003-01-13 Alberto Manuel Brandão Simões * NAT/NAT.pm (check_sentence_similarity): check sentence similarity... better: check the how much the first sentence could be translated with the second; * src/corpus.c (corpus_new): alloc index buffer; (index_enlarge): enlarge index buffer, if needed; (corpus_add_word): add information regarding sentences to index buffer; * src/corpus.h: added index buffer; * NAT/NAT.pm (merge_dict_lex): added numbered keys to access using word identifiers; * NAT/scripts/nat-dumpDB: created script to dump DB files. Usefull for debugging; * NAT/scripts/nat-createDB: created script to merge lexicon file with generated dictionaries; * NAT/scripts/nat-browse: renamed 'navega.pl' to 'nat-browse' and changed support for new database type; 2002-12-21 Alberto Manuel Brandão Simões * NAT/Makefile.PL (MY::postamble): Added dist for the NAT module inside the package. This needs more test, but seems to work. 2002-12-17 Alberto Manuel Brandão Simões * test/nat-these: Added iteraction number for any of the methods (still defaults 5 for ipfp and 10 for the others); Cleaned make like time check on the script; * src/post.c (print_quoted): Changed from the 'quote' function to a 'print_quoted' one, with better performance and less bugs; * src/initmat.c (InitialEstimate): Corrected stupid bug on stop-words support; 2002-12-16 Alberto Manuel Brandão Simões * test/nat-these: Added option to stop words; * src/words2id.c (main): Added option to save word list in bynary format; * src/initmat.c (InitialEstimate): Added option to save dots defining only a macro; (main): support for stop-words; 2002-12-06 Alberto Manuel Brandão Simões * test/nat-these: create generated files on a directory itself, and validate them for pre-built versions; * test/Makefile.am: include nat-these on the distributin file; * src/matrix.c (SearchItem): binary search; 2002-12-03 Alberto Manuel Brandão Simões * src/words2id.c: convert words to identifiers; * src/*.[ch]: removed wc_int_t to guint32 at all; * configure.ac: removed wc_int_t to guint32 at all; * t/*: added directory and tests; 2002-11-26 Alberto Manuel Brandão Simões * src/corpus.nw: Created file, merging corpus.h and corpus.c, for use with noweb literate programming tool; Changed some of the API * src/sampleb.c, src/samplea.c: re-indentation; 2002-11-25 Alberto Manuel Brandão Simões * src/words.nw: Created file, merging words.h and words.c, for use with noweb literate programming tool. 2002-11-19 Alberto Manuel Brandão Simões * src/mattest.c, src/samplea.c, src/sampleb.c, src/initmat.c, src/ipfp.c, src/mat2dic.c: changed ill called 'err' function to the correct report_error one; * src/corpus.c, src/corpus.h: changed unsigned longs and unsigned ints to more accurate data types from glib. * src/words.c, src/words.h: changed unsigned longs and unsigned ints to more accurate data types from glib. Using guint16 and guint32 it's easier to be portable and to know how many bites each of them uses. * configure.ac: now we will depend on glib-2.0; added -Wall to default CFLAGS; 2002-11-17 Alberto Manuel Brandão Simões * src/matrix.c: changed functions accordingly the new data structure; * src/matrix.h: changed data structure; 2002-11-16 Alberto Manuel Brandão Simões * COPYING: added specific project copyright; 2002-11-13 Alberto Manuel Brandão Simões * configure.ac: Detect perl. Can be useful.