=head1 NAME jspell - Command line interface to Jspell morphological analyzer =head1 SYNOPSIS jspell [-dfile | -pfile | -wchars | -Wn | -t | -n | -x | -b | -S | -B | -C | -P | -m | -Lcontext | -M | -N | -Ttype | -V | -o format | -g | -y | -u] file ..... jspell [-dfile | -pfile | -wchars | -Wn | -t | -n | -Ttype| -o format] -l jspell [-dfile | -pfile | -ffile | -Wn | -t | -n | -B | -C | -P | -m | -Ttype | ] {-a | -A} jspell [-dfile] [-wchars | -Wn] [-o format] -c jspell [-dfile] [-wchars] [-o format] -e[1-4] jspell [-dfile] [-wchars] -D jspell -v [v] =head1 DESCRIPTION B is a morphological analyzer. It can be used in four different ways: =over 4 =item * as a standard B library; =item * as a non buffered command line application; =item * as a command interpreter; =item * as an interactive program. =back =head2 Interactive Application B should be invoked with a text file name. This text correctness will be verified in following way: each word that does not exist on the dictionary will be shown in reverse video at the top of the screen, with the context text shown. The user should opt for one of the correction suggestion (if them exist). The suggestion can be formed in two ways: =over 4 =item * detection of approximated words (words that miss a letter, or have some of them changed; Normally we call this I); =item * using formation rules, starting at a known root (although there are no I to tell that derivation is correct, it will be shown as well. =back One of the last rows in the screen will show a mini-menu with some options: =over 4 =item digit the number of the chosen option to replace the original text; =item Space accepts the word only this time (does not change any thing); =item R replaces the word with user text; =item E replace all word occurrences in the text; =item A accepts the word through all the remaining text; =item I,U accept the word (in the case of B, with the same case that the original word, in the B case, all downcase) and actualizes the personal dictionary. We should note that our dictionary maintains more information about the word than itself, so the user will be prompt for a classification, flags and a small comment or, alternatively, we can choose some suggestions formed by B using AFF file rules. =item L search words on the system dictionray (this is controlled by the compilation variable WORDS); =item X write all the remaining file as it is, ignore all erroneous words and start the next text correction (if it exists); =item Q exit immediately and leave file without changes; =item ! shell exit; =item ^L redraw the screen; =item ^Z suspend B; =item ? show help screen. =back =head2 Command line options =over 4 =item -M actives mini-menu on the bottom of the screen; =item -N de-active mini-menu from the bottom of the screen; =item -L use this option to set the number of lines of context to be shown. The number should be glued to the flag; =item -V shows characters using more than 7 bits in the C style. This option can be usefull when we are working with older terminals that can't show some characters; =item -t input file is written in TeX or LaTeX. This mode is automatically activated if the file extension if C<.tex>; =item -n input file is in nroff/troff format; =item -b forces the creation of a backup file (using the extension C<.bak>); =item -x disables the creation of the backup file; =item -B considers that two words concatenated without spaces between them are errors; =item -C considers that two correct words concatenated is a correct word, too! This option can be usefull on languages like German where some words are made of concatenations; =item -P do not make suggestions of combinations root/affix to be added to the personal dictionary; =item -m make it possible combinations of root/affix that aren't on the dictionray; =item -S sort the suggestion list by correctness probability instead of the alphabetic one; =item -d I specify an alternative dictionary; =item -p I specify an alternative personal dictionary. If file does not start by a slash, the C<$HOME> preffix is assumed. If you specify one of the default C of the library dictionary and there is a file C<.jspell_hashfile>, this will be used as the personal dictionary. If none of there conditions are true, we use the C<.jspell_words> file. Without this option, B will search personal dictionaries in the current directory and in the home dir. If both exists, they will be loaded. =item -w I specify additional characters that can be used inside words; Using C<-w "&"> we make "AT&T" a valid word; =item -W I specify the maximum size of legal words. If you want to verify all words, independently of the size, use C<-W 0>; =item -T I assume some formating type for all files. Argument I can be one of the unique names defined on the affix file (example C) or a file suffix containing a dot (example C<.tex>); =item -l used to produce the bad word list using standard input; =item -a this was thought to be used using pipes. This is a command line interpreter. If the word is found directly on the dictionary, or using any of the I, appears the information about the word root and it's root and affix/preffix features. This information appears using a format that can be defined by the user. If the word isn't in the dictionary, the output line starts with and ampersand (C<&>), a space, the original word, a space, the number of characters between the line start and the word, a two dots and a list of approximated words where appears the name of the word, the equal sign and the classification using the format specified. If the word can be formed using and illegal addition of affixes of a known root, there will be presented a suggestion list, too! If there isn't an approximated word, but only formation using invalid affixes, the line uses a similar format but instead of an ampersan there will be a question mark. Resuming: If the word does Exist on the dictionary, the output will be: * : , , ... If there is NOT in the dictionary: & : , , ..., , ... where C and C have the following fomat: = format(,,, ,) This I if defined by the user, being the default: lex(, [], [], [], []) The separators C<,>, C<=>, e C<:> are defined using a C<#define> clause. So, they can be changed on compile time. Using the C<-a> flag, there are a set of commands starting with these characters: C<*>, C<@>, C<&>, C<+>, C<->, C<~>, C<#>, C, C<%>, C<$> or C<^>. =over 4 =item C<*> Add to personal dictionary. You can add the class, flags and comments using the dictionary separator. =item @ Accept the word, but do not add it to the dictionary; =item & Add the lowercase converted word to the personal dictionary; =item # Save current personal dictionary; =item ~ Indicates the parameters based on the file; =item + Enter in TeX mode; =item - Exit from TeX mode; =item ! Enter I mode; =item % Exit I mode; =item $ I Alters the function mode as the C function (see the library section); =item ^ Verifies the rest of the line =back Note that in the I mode the information about correct words will be hidden. This can be used to make some programs fasters. =item -A works like the C<-a> option, excepts that if the line starts with a string like C<&Include_File&>, the rest of the line is considered to be the name of a file to be read words from; =item -s if used, B will stop with signal C after reading a line of input, and continues reading the next line when it receives the C signal. This is only valid if C<-a> or C<-A> option is active too, and on C derived systems; =item -f used to specify a file name where B should write results, instead of the standard output. Only valid in conjuntion with a C<-a> or C<-A> option; =item -v makes B dump it's current version. If you double the option (C<-vv>), will be printed compilation options, too! =item -c Makes words to be read from standard input and, for each of them, write a list of possible roots, classification and original word classification derived that way, as the used flags. Note that generated roots can be not found in the dictionary. Example, the 'batatas' input (portuguese) makes: batatas lex(batata, [CAT=adj_nc], [N=p], []), lex(batatar, [CAT=v, [CAT=v,P=2,N=s,T=p], []) =item -z makes the used flag to be printed as well: lex(batata, [CAAT=adj_nc], [N=p], {})/p =item -e is the inverse of C<-c>. Starting with a word and a flag, generates all hypothesis of derived words using the flag rules: example: batata/p generates batata batatas= lex(batata, [], [N=p], []) =item -D makes the dictionary affix tables to be written on the standard output; =item -o I defines the format for the output. It should be a string containing five C<%s> to be filled by the word root, classification of the root, and the classification associated to the flag. The default, as seen before, is C =item -g indicates that should be shown only solutions and not suggestions. Using this, makes better performance. =item -y indicates that we want to obtain only the suggestions created using flags not defined for the word. There will be no near misses calculations. =item -u ignore punctuation. There is a define C containing all punctuation marks. =back Output in the options C<-a>, C<-e> and C<-c> uses some separators that are defined with the following names: SEP1 "," SEP2 ";" SEP3 "=" SEP4 "\n" C is used to separate solution hypothesis. C is used when we show near misses indicating the end of that type of solutions. C is used to separate the original word from the information. C ends the word record. =head1 Using the C library Programs using B as a library should include C and link with C or C. These programs should init the library calling C and, after it, you can call other API functions. =head2 init_jspell(char *options) Init B with the flags in the C string. Normally the C<-a> option is allways used. Example of calling B: init_jspell("-d dic-pe -W 0 -a -cf") =head2 word_info(char* word, sols_type solutions, sols_type near_misses) This function gives information about the C searched in the dictionary. If it is found, the possible ways to form it are given in the C array where, each element is a string containing the word root, it's classification and the classification that makes the C possible. If the word is not found in the dictionary, the C array contains the possible solutions using the format specified with the C<-o> flag. If C or C contains an empty string, then, there is no more solutions/suggestions, respectively. =head2 void init_modes(char* modes) Used to change the suggestion output format. There are two types of suggestions: those done doing small changes in the original word (designated by I) and those that are constructed adding affixs not provided for that word. Disponible flags are: =over 4 =item g don't give suggestions from other words (disable near misses); =item G inverse of C: enable near misses; =item P don't give suggestion from combining not provided affixes to the word; =item m turns off C

option; =item y don't give suggestions by swapping characters in the original word; =item Y turns off C option; =item z show flags used for the suggestion; =item Z turn off C option; =back =head2 char* get_next_word(char *buf, char *next_word) Given the buffer C, put in C the next valid word encountered. Returns a pointer to C position after the end of the word found. Returns C is none is found. =head2 get_roots(char *word, sols_type solutions, char in_dic[MAXPOSSIBLE]) Given the C search its possible origins although they aren't in the dictionary. The vaious possibilities are returned on the C array, containing each position the root indication, it's classification and the classification related to the used flag. This information is in a string with the habitual output. Each entry in the C array shows if the root is, or not, in the dictionary. If C is an empty string, then there aren't more solutions. =head2 insert_word(char *word, char *class, char *flags, char *comm) Inserts the C with it's classification (C), C and comment (C) in the personal dictionary. =head2 accept_word(char *word, char *class, char *flags, char *comm) Accepts the C with it's classification (C), C and comment (C) until the end of the utilization of the library. =head2 char* replace_word(char *start, char* word, char* curchar) Substitutes the word existing in the text in the C position by the C indicating in C where the last word ended. Returns the position in the buffer where the new word ends. =head2 save_pers_dic() Saves the personal dictionary in the present state. =head2 ID_TYPE word_it(char* word, char* feats, int* status) For a C returns an unique identifier. =head2 char *word_f_id(ID_TYPE id) Given an identifier, returns a pointer to the position of the respective word. =head2 char *class_f_id(ID_TYPE id) Given an identifier, returns a pointer to the position of the respective classification. =head2 char *flags_f_id(ID_TYPE id) Given an word identifier, returns a string with it's respective flags identification. =head2 Example #include "jslib.h" main() { int i; char X[BUFSIZ], char w[100], *p; sols_type solutions, near_misses; init_jspell("-d dict -W 0 -a"); while(gets(X)) { p = X; while (p=get_next_word(p, w)) { word_info(w, solutions, near_misses); puts("solutions"); i = 0; while(solutions[i][0]) puts(solutions[i++]); puts("near misses"); i = 0; while(near_misses[i][0]) puts(near_misses[i++]); } } } If you save this file with the name C, you can compile it with: gcc -o exp-lib exp-lib.c -ljspell =head1 THANKS We should thanks Pace Willisson and Geoff Kuenning for putting C as a open source application, from where much of this application code was borrowed. =head1 AUTHOR Ulisses Pinto J.Joao Almeida =head1 SEE ALSO See the following man pages: jspell(3), jspell-aff(1), perl(1), agrep(1) =head1 BUGS We wait for them at any of the author e-mails! =cut