You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Eric Ihli 234b40a2e9 Fumble around with good turing 3 years ago
..
ReadMe.txt Working completions 3 years ago
a.out More SGT exploration 3 years ago
freq_freqs.txt Working completions 3 years ago
gt.s Fumble around with good turing 3 years ago
gtanal.S Fumble around with good turing 3 years ago
gtfunc.S Fumble around with good turing 3 years ago
sgt.h More SGT exploration 3 years ago
sgt.h.gch More SGT exploration 3 years ago
sgt.zip Working completions 3 years ago
sgttest.cpp Working completions 3 years ago

ReadMe.txt

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

SGT
===

The files here contain a C++ class for implementing simple Good-Turing
re-estimation, as described by Geoff Sampson in the book Empirical Linguistics
(2001), and on the web at http://www.grsampson.net/RGoodTur.html. The code
here is a C++ adaptation of the published code by Sampson and Gale, with the
bug fix issued in 2000. It is encapsulated as a class to allow it to be
incorporated into other programs. An additional coding change is that the data
can be presented in any order, whereas the original code required the data to
be in ascending order.

Sampson's original code was issued with no restrictions on use. In keeping
with the spirit of this, the code here is issued under an open source licence
which allows essentially unrestricted use.

LICENCE
-------
Copyright (c) David Elworthy 2004.
All rights reserved.

Redistribution and use in source and binary forms for any purpose, with or
without modification, are permitted provided that the following conditions
are met:

1. Redistributions of source code must retain the above copyright notice,
   this list of conditions, and the following disclaimer.
 
2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions, and the disclaimer that follows 
   these conditions in the documentation and/or other materials 
   provided with the distribution.

THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN
NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Contact details
---------------
You may contact me at david@friendlymoose.com. I would be happy to hear of any
experiences you have with the code; please feel free to send me updated
versions. The reference site for the code is http://www.friendlymoose.com/.

Files and use
-------------
There are three files:
sgt.h       SGT header file
sgttest.cpp A test and example program

There is no source file, as the SGT class is a template over the observation
type, typically either an int or a double.

Information about using the class is included in the header file. The code has
been tested with g++ version 3.2 on cygwin and Microsoft Visual Studio version
6 on Windows 2000. You can compile and link the test program using g++ using
the command
     g++ -o sgttest sgttest.cpp

For Visual Studio, from the command line, you can compile and link with
     cl -GX sgttest.cpp

Version history
---------------
Initial version released January 2004.
Updated to a better implementation April 2004.