save_pers_dic()
jspell - a morphological analyzer
jspell [-dfile | -pfile | -wchars | -Wn | -t | -n | -x | -b | -S | -B | -C | -P | -m | -Lcontext | -M | -N | -Ttype | -V | -o format | -g | -y | -u] file .....
jspell [-dfile | -pfile | -wchars | -Wn | -t | -n | -Ttype| -o format] -l
jspell [-dfile | -pfile | -ffile | -Wn | -t | -n | -B | -C | -P | -m | -Ttype | ] {-a | -A}
jspell [-dfile] [-wchars | -Wn] [-o format] -c
jspell [-dfile] [-wchars] [-o format] -e[1-4]
jspell [-dfile] [-wchars] -D
jspell -v [v]
jspell is a morphological analyzer. It can be used in four different ways:
jspell should be invoked with a text file name. This text correctness will be verified in following way: each word that does not exist on the dictionary will be shown in reverse video at the top of the screen, with the context text shown. The user should opt for one of the correction suggestion (if them exist).
The suggestion can be formed in two ways:
One of the last rows in the screen will show a mini-menu with some options:
cat -v
style. This option can
be usefull when we are working with older terminals that can't show some characters;
.tex
;
.bak
);
$HOME
preffix is assumed. If you specify one of the default fich-hash
of the
library dictionary and there is a file .jspell_hashfile
, this will be used as
the personal dictionary. If none of there conditions are true, we use the
.jspell_words
file.
Without this option, jspell will search personal dictionaries in the current directory and in the home dir. If both exists, they will be loaded.
-w "&"
we
make ``AT&T'' a valid word;
-W 0
;
nroff
) or a
file suffix containing a dot (example .tex
);
If the word is found directly on the dictionary, or using any of the flags, appears the information about the word root and it's root and affix/preffix features. This information appears using a format that can be defined by the user.
If the word isn't in the dictionary, the output line starts with and
ampersand (&
), a space, the original word, a space, the number of
characters between the line start and the word, a two dots and a list
of approximated words where appears the name of the word, the equal
sign and the classification using the format specified. If the word
can be formed using and illegal addition of affixes of a known root,
there will be presented a suggestion list, too!
If there isn't an approximated word, but only formation using invalid affixes, the line uses a similar format but instead of an ampersan there will be a question mark.
Resuming:
If the word does Exist on the dictionary, the output will be:
* <original> <offset>: <solution>, <solution>, ...
If there is NOT in the dictionary:
& <original> <offset>: <err>, <err>, ..., <affix sugst>, ...
where err
and affix sugst
have the following fomat:
<word> = format(<root>,<root fea>,<preffix fea>, <suffix fea>,<suffix2 fea>)
This format if defined by the user, being the default:
lex(<root>, [<root fea>], [<preffix fea>], [<suffix fea>], [<suffix2 fea>])
The separators ,
, =
, e :
are defined using a #define
clause. So, they can be changed on compile time.
Using the -a
flag, there are a set of commands starting with these
characters: *
, @
, &
, +
, -
, ~
, #
, !
, %
, $
or ^
.
init_modes
function (see the library section);
Note that in the terse mode the information about correct words will be hidden. This can be used to make some programs fasters.
-a
option, excepts that if the line starts with a
string like &Include_File&
, the rest of the line is considered to
be the name of a file to be read words from;
SIGSTP
after reading a
line of input, and continues reading the next line when it receives
the SIGCONT
signal.
This is only valid if -a
or -A
option is active too, and on
BSD
derived systems;
-a
or -A
option;
-vv
), will be printed compilation options, too!
Example, the 'batatas' input (portuguese) makes:
batatas lex(batata, [CAT=adj_nc], [N=p], []), lex(batatar, [CAT=v, [CAT=v,P=2,N=s,T=p], [])
lex(batata, [CAAT=adj_nc], [N=p], {})/p
-c
. Starting with a word and a flag, generates all
hypothesis of derived words using the flag rules:
example: batata/p generates
batata batatas= lex(batata, [], [N=p], [])
%s
to be filled by the word root, classification of the root,
and the classification associated to the flag. The default, as seen
before, is lex(%s, [%s], [%s], [%s], [%s])
DEFAULT_SIGNS
containing all
punctuation marks.
Output in the options -a
, -e
and -c
uses some separators that
are defined with the following names:
SEP1 "," SEP2 ";" SEP3 "=" SEP4 "\n"
SEP1
is used to separate solution hypothesis. SEP2
is used when
we show near misses indicating the end of that type of
solutions. SEP3
is used to separate the original word from the
information. SEP4
ends the word record.
Programs using jspell as a library should include jslib.h
and link
with jspell.a
or jspell.so
.
These programs should init the library calling init_jspell("...")
and,
after it, you can call other API functions.
Init jspell with the flags in the options
string. Normally the
-a
option is allways used. Example of calling jspell:
init_jspell("-d dic-pe -W 0 -a -cf")
This function gives information about the word
searched in the
dictionary. If it is found, the possible ways to form it are given in
the solutions
array where, each element is a string containing the
word root, it's classification and the classification that makes the
word
possible.
If the word is not found in the dictionary, the near_misses
array
contains the possible solutions using the format specified with the
-o
flag.
If solutions[i]
or near_misses[i]
contains an empty string,
then, there is no more solutions/suggestions, respectively.
Used to change the suggestion output format. There are two types of suggestions: those done doing small changes in the original word (designated by near misses) and those that are constructed adding affixs not provided for that word.
Disponible flags are:
g
: enable near misses;
P
option;
y
option;
z
option;
Given the buffer buf
, put in next_word
the next valid word
encountered. Returns a pointer to buf
position after the end of the
word found. Returns NULL
is none is found.
Given the word
search its possible origins although they aren't in
the dictionary. The vaious possibilities are returned on the
solutions
array, containing each position the root indication, it's
classification and the classification related to the used flag. This
information is in a string with the habitual output. Each entry in the
in_dic
array shows if the root is, or not, in the dictionary.
If solutions[i]
is an empty string, then there aren't more solutions.
Inserts the word
with it's classification (class
), flags
and
comment (comm
) in the personal dictionary.
Accepts the word
with it's classification (class
), flags
and
comment (comm
) until the end of the utilization of the library.
Substitutes the word existing in the text in the start
position by
the word
indicating in curchar
where the last word ended.
Returns the position in the buffer where the new word ends.
save_pers_dic()
Saves the personal dictionary in the present state.
For a word
returns an unique identifier.
Given an identifier, returns a pointer to the position of the respective word.
Given an identifier, returns a pointer to the position of the respective classification.
Given an word identifier, returns a string with it's respective flags identification.
#include "jslib.h"
main() { int i; char X[BUFSIZ], char w[100], *p; sols_type solutions, near_misses;
init_jspell("-d dict -W 0 -a");
while(gets(X)) { p = X;
while (p=get_next_word(p, w)) { word_info(w, solutions, near_misses); puts("solutions"); i = 0; while(solutions[i][0]) puts(solutions[i++]); puts("near misses"); i = 0; while(near_misses[i][0]) puts(near_misses[i++]); } } }
If you save this file with the name exp-lib.c
, you can compile it with:
gcc -o exp-lib exp-lib.c -ljspell
We should thanks Pace Willisson and Geoff Kuenning for putting
ispell
as a open source application, from where much of this
application code was borrowed.
Ulisses Pinto J.Joao Almeida <jj@di.uminho.pt>
See the following man pages: jspell(3), jspell-aff(1), perl(1), agrep(1)
We wait for them at any of the author e-mails!