You can learn how Bear works by reading or working through this simple tutorial.

You have two options:

- build and run Bear using this guide; or
- don’t build or run Bear: just download the output I provide, if you trust me.

All the commands I use below are listed here.

If you have built Bear and added it to your path, then execute this command:

`
$ memory_bear
`

You should see output that looks something like this (details of all screenshots may vary):

You can see from the “`Arguments:`”
section at the bottom of
this screenshot that `memory_bear` has two mandatory arguments:
`INPUT_FILENAME` and `TIME_BUDGET`.
Let’s just create an empty input file,

`
$ touch empty.csv
`

and run `memory_bear` on it, specifying a time budget of,
say, one minute:

Okay, so we learn that it’s mandatory to specify the label column(s) using one of these four options. Let’s just specify it as column 0:

So we also need to specify a filename for saving the Bear model or writing an output file with predictions (or both). Let’s just specify a Bear model filename:

Now we’re getting somewhere! Bear fired up with a welcome message, and then some feedback to us of its parsing of what we have asked of it. At the left side of each log line you will always see the local time (to the minute) and the time that has elapsed since the last log line. (The first log line tells you the local date when execution started.)

We can see that Bear did a first pass over the input file, but then it told us that an empty data file isn’t allowed!

So let’s create the simplest possible dataset: a single example (row),
with no features, and just a label value,
in `single.csv`:

We can now run `memory_bear` on this dataset successfully:

All of these steps will become clearer as we work through these tutorials,
but in the end we see that Bear built a model and saved it to
`single.bear.gz`.
If we take a look at the decompressed bytes in that file,

we can see that it consists of binary data within plain text tags; this is
the general way that Bear saves objects.
It runs it through `gzip` to compress these 179 bytes down to 73.

We can get an overview of what is in this model file using the supplied program
`bear_model_details`:

That’s not too enlightening in this case!
We can get a bit more detail using the `--verbose` option:

We see that there is just an “empty” model, which is the model that Bear creates without using any features at all. This makes sense, because there were no features! This empty model records that the minimum and maximum allowed label values are 42, because this was the only label value in the input data (and Bear never extrapolates), and its single prediction for the label is likewise 42.

We can stream feature data through this model and get it to make predictions
using the supplied program `bear_predict`.
Its command-line options are similar to those of
`memory_bear`:

As the argument specifications at the bottom of this help screen show, we can
run it in “interactive” mode by specifying
`stdin` and `stdout` for the input and output files,
although we need to specify the filetypes for each:

At this point, the program is waiting for us to specify feature values for an
example.
In this case there are no features, so if we hit the `return` key, it
spits out its prediction:

We can do this as many times as we want:

If we’ve taken more than five seconds to do this, we'll even be given a “progress update” on the number of rows processed so far:

After doing this a fourth time, the fun has probably worn off, and we can
finish our input by pressing `control-D` and `return`:

Let’s make things slightly more interesting by having more than
one example in our dataset.
For example,
`10-labels.csv`:

If we run `memory_bear` on this dataset,
now using short option names,

`
$ memory_bear 10-labels.csv 1m -l 0 -s 10-labels
`

and look at the details of the model created,
`10-labels.bear.gz`,

we can see that the empty model now records a minimum allowed label of −370, a maximum allowed label of 120, and a constant prediction of −22.6. The first two of these are just the bounds of the 10 input labels; Bear never extrapolates beyond the data it is given. Likewise, the prediction of −22.6 is just the mean value of those 10 input labels, which minimizes the MSE loss (the default for Bear) if the empirical probabilities are taken as the best estimate of the true probability distribution.

As before, we can run `bear_predict` in interactive mode
to stream example feature data (again, here we have no features)
through the model:

where this time our fun was expended after two hits of the
`return` key, after which I hit `control-D`
and `return` to end the input datastream.

I noted above that Bear’s default loss function is MSE, which is minimized if the predictions are the mean expectation values. Let’s change the loss function to MAE:

`
$ memory_bear 10-labels.csv 1m -l0 -nMAE -s10-labels-mae
`

and run `bear_predict` on it:

The prediction is now totally different: 3.5.
This is because MAE loss is minimized when the prediction is the
*median* expectation value, rather than the mean,
or any arbitrary value in the closed interval between the two median values if
there is an even number of data points.
If you work it through,
the median values for `10-labels.bear.gz`
are 3 and 4, and Bear has followed the normal practice of breaking
the arbitrariness by taking
the mean of these two values.

However, if we now look at the
details of the model created,
`10-labels-mae.bear.gz`,

we see that things look a little strange. The “empty label estimate” is just our median 3.5, but there is also a “empty label prediction” of 0, and the minimum and maximum allowed label values don’t match our dataset. What’s going on here?

The answer is that to support the MAE loss function, Bear
performs a trick by
transforming the label values under the hood into a quantity linearly
related to their *cumulative frequencies*, which is what you need
to use to compute the median.
These are the numbers that look wrong above.
At the end of the process Bear transforms back to actual values,
calling the final prediction the “label estimate.”
You don’t need to worry about “how it makes the sausage”
unless you are inspecting the details of the model file.

Let’s now add *frequencies* (counts of the number
of examples having the given label value), to create
`frequencies.csv`:

This just means that we have 4 examples with a label value of −1.3, one example with 4.7, and so on. This is completely equivalent to having a data file with four rows with label value −1.3, etc.

We can specify that our input file has a frequency column by
using the `--has-frequency-column` and
`--frequency-column` options
(here in their short forms `-f` and `-c`):

`
$ memory_bear frequencies.csv 1m -l0 -f -c1 -sfrequencies
`

which Bear parses and includes in its feedback to us:

We now see that the model,
`frequencies.bear.gz`,
is similar to the previous one,

except that the prediction is now −33.321875.
This is just the weighted mean of the input label values,
where each weight is just the relative frequency;
e.g., for the first label value of −1.3 it is
4 / 32, since the total frequency is 32; and so on.
Note that my codebase automatically includes separators (here, a space
in the decimal value)
in its logging, but these are never added in output
files.
We can confirm this by running
`bear_predict` on the model:

Again, if we switch to the MAE loss function,

`
$ memory_bear frequencies.csv 1m -l0 -fc1 -nMAE -sfrequencies-mae
`

and inspect the model file,

`
$ bear_model_details frequencies-mae -v
`

then we find that the “empty label estimate” is now 3,
which is just the median of the values in `frequencies.csv`.

OK, enough with datasets with just labels an no features.
Let’s add a feature!
Here is a simple dataset
`linear-1.csv`
where the second (label) column is
obviously linearly dependent on the first (feature) column:

We know how to run Bear on this, where we now just have to specify that the label column is column 1 (i.e., the second column):

`
$ memory_bear linear-1.csv 1m -l1 -slinear-1
`

We see Bear doing a lot more than it did for the empty models.
Skipping these details, for the moment,
we excitedly run `bear_predict` to see the results of
our linear regression, now typing a feature value before hitting
`return`:

Well that was disappointing! No matter what feature value we entered, the model gave us a label prediction of 53. It even did this if we didn’t specify a feature value at all!

Why didn’t we get any linear regression?
We can again examine the model
`linear-1.bear.gz`,
to try to debug this:

This is just the empty model again! Its constant prediction of 53 is just the mean value of the input labels. But why did Bear just give us the empty model?

The answer is that Bear only gives us *statistically significant*
structure that it finds in the data.
In this case it decided that these 9 data points didn’t give it
any statistically significant signal of anything more than just the
empty model.
And that sounds fair enough: without any other information about what
sort of relationship we are expecting to find, in general it would be
difficult to draw any concrete conclusions from just 9 data points.

So let’s give Bear more of our linear data, so that it might have a chance of finding something statistically significant. One easy way to do that is to add a frequency column to our dataset, and set the frequency of each of our nine examples to, say, 20:

If we run Bear on this,

`
$ memory_bear linear-2.csv 1m -l1 -fc2 -slinear-2
`

and then run `bear_predict` on the created model
`linear-2.bear.gz`,

we see that Bear has modeled the 9 data points exactly! Of course, that’s only because our dataset had no noise at all: every feature value mapped exactly to a single label value for every one of its 20 examples, and Bear decided that each of these mappings was statistically significant in itself. This perfect modeling is reflected in Bear declaring the “construction weight” (which I will describe later) to be 10,000,000,000, which is an arbitrary upper bound that I apply in the code. Real datasets will not generally be both noiseless and statistically significant.

Looking more at my above play with
`bear_predict`,
you can see that
if we specify a feature value between
two values in our original dataset—here 1.4 and 1.6—Bear
doesn’t linearly interpolate, like you might expect from
linear regression; rather it
gives us the prediction 13 of feature value 1 for the former,
and the prediction 23 of feature value 2 for the latter.
It seems to be using the *nearest* feature value in the original
dataset.
Moreover, if we specify the feature to be less than 1, it predicts the smallest
label, 13; if we specify the feature to be greater than 9, it predicts
the largest label, 93, so it doesn’t extrapolate either.
These are general features of Bear: its predictions are
*piecewise constant*, and do not exceed the bounds of the input label
data.
In this case there are 9 of these pieces, which surround each of the 9 feature
values in the input dataset, with the pieces on the ends continuing on
to negative and positive infinity.

To show this more explicitly, I have created a file
`linear-2-test-features.csv`
of feature values
spanning the interval from −2 to +12, stepping by 0.1 each time.
We can pass those into
`bear_predict`, and ask it to write its predictions out to
the file
`linear-2-test-out.csv`:

You can graph the results using whatever program you like; for simplicity, I have just used Microsoft Excel:

This shows you visually that Bear has created its “perfect” model of this noiseless data as piecewise constant.

If you play around with `bear_predict` some more, you will find
that the prediction does indeed jump up to 23 at a feature value of 1.5,
or half-way between the two input feature values of 1 and 2.
But if you bisect even more, you might be surprised that it actually jumps up
at around 1.498046875.
What’s going on here?

The answer is that Bear internally uses a custom 16-bit floating point
representation, that I dubbed “`paw`,”
in the core engine that does the statistical modeling.
The `paw` format
is very similar to Google Brain’s `bfloat16` format,
except that `paw` has 7 bits of exponent and
8 bits of mantissa, whereas `bfloat16` has
8 bits of exponent and
7 bits of mantissa.
Google chose one less bit of precision that I did for Bear because
they
had competing design goals due to a legacy codebase
that made it advantageous
for `bfloat16` to have the same dynamic range as the standard
32-bit `float`.
I had no such constraints, and could
let `paw` have one extra bit of precision,
since the dynamic range of
`paw`
of around 10^{±19}
is more than sufficient for all practical purposes,
compared to around
10^{±38} for `bfloat16`.

The result is that feature values greater than 1.5 − 1 / 512 round up to 1.5 in this core modeling.

Because Bear’s models are piecewise constant in feature space anyway, you would assume that this quantization of the thresholds between adjacent pieces would usually have no significant practical ramifications. But what if we were to shift all of these feature values to the right, by, say, 1,000,000?

You might guess that all of these feature values would be quantized to the same
`paw` value.
But if you run Bear on this dataset, you find that it produces identical results,
just shifted to the right by 1,000,000.
How did it manage this?

If you examine the model created, you will see that it now has a “Last part feature offsets” section.

FINISH

You might think that this is an artificially contrived example.
But actually it’s not: what if one of your features is a Unix time?
Those values will in most applications
be in the billions, but will likely only vary by
millions or less.
With `paw` precision no better than about 0.1%, in many cases it would
round every such Unix time to the same `paw` value, and you would lose
it as a potential feature.

So now that we have more than an empty model, let’s examine in more
details what’s actually in
`linear-2.bear.gz`.
Let’s start with the *non*-verbose version of the program:

We still only have one model (labeled with the index 0), but
now it has a “weight” with that 10^{10}
upper-bound we saw above, and it has an
“assembly” which is “`e|0`”
rather than just the
“`e`” we had before.
I’ll describe these “weights” in more detail in later
tutorials, but for now just take it as the “goodness” of
a model.
The “assembly” `e|0` just tells us that
this model has used
the empty model, and then modeled its residuals with feature 0.
(This will become clearer when we have more complicated models.)

If you now run this command in verbose mode,

`
$ bear_model_details linear-2 -v
`

then you will see essentially all the internal details of this Bear model file. Without getting into the weeds of those details, if you read from the bottom you will see that Bear models the labels with the empty model, and then tries to model the residuals of that model (which are between −40 and +40) with the feature that we have supplied. In this case it succeeded in finding statistical significance in that residual modeling.

As a convenience, `memory_bear` lets you include prediction feature
values in the same input file as your training data, and it will make
predictions for those feature values after it finishes creating its model.
All you need to do is include those prediction feature rows in your input
file *with an empty label field*.
For example,

`
$ cat linear-2.csv linear-2-test-features.csv > linear-2-combined.csv
`

simply appends the prediction feature rows to the training rows. The label values in column 1 are implicitly missing for these rows (since there is no column 1), which marks them as prediction rows. Frequencies are never needed for prediction rows, so it doesn’t matter that column 2 is also missing for these rows.

We now have to specify an output filename for the predictions to be written out to. (In this mode, it is optional whether you want to save the Bear model to a file or not.) So the command

`
$ memory_bear linear-2-combined.csv 1m -l1 -fc2 -o linear-2-predictions.csv
`

trains the model and then makes predictions for our 141 prediction rows,
writing the results out to
`linear-2-predictions.csv`.
We can easily prove that the predictions are identical to those
obtained above:

`
$ cmp linear-2-predictions.csv linear-2-test-out.csv
`

Bear also allows you to specify that one or more columns in your input data
file should be simply passed through as plain text
to the corresponding row of the output
file, without playing any role in the actual modeling or predictions.
This can be useful if one of your columns is a primary key, or if multiple
columns together form a composite primary key, or even if some columns are
simply comments or other descriptive text.
For example, if we add an identifier column and a comment column to
`linear-2.csv`, and add a few prediction rows, to create
`ids.csv`:

and then specify to `memory_bear` that columns 0 and 4
are “ID” (passthrough) columns,

`
$ memory_bear ids.csv 1m --multi-id-columns='[0,4]' -l2 -fc3 -o ids-out.csv
`

then we can see that these two columns are ignored, but passed through for the prediction rows to the output:

If you specify one or more identifier columns in this way, you may not actually
need or want
to see the actual feature value(s) for those rows.
To suppress their output you can just specify
`--no-features-out`:

`
$ memory_bear ids.csv 1m -j'[0,4]' -l2 -fc3 -o ids-out-nf.csv --no-features-out
`

Now in the output you just see your ID columns and the corresponding predicted label:

Although you can specify to `memory_bear` and
`bear_predict` any arbitrary columns to be
labels or identifiers,
both programs write out predictions with all identifiers
first, followed by all features (unless specified otherwise),
followed by all labels,
in each case in the order that the columns appeared in the input data.
If you need an alternative permutation of the columns in the output file
you should use another utility to achieve that result.

We’ve played enough with our noiseless dataset
`linear-2.csv`, so let’s generate some data that at least
has some noise added to it.
You can do this yourself using whatever program you like, but
I’ll use the supplied program
`simple_bear_tutorial_data`
so that you have the same data:

`
$ simple_bear_tutorial_data linear-3.csv -r19680707
`

which creates the file
`linear-3.csv`
with 50 training rows and 250 prediction rows in it.
The final argument `-r19680707` simply ensures that you seed
the random number generator the same as I did, so that you get
exactly the same data.
If you graph the data you should see something like this:

If you now run `memory_bear` on this data,

`
$ memory_bear linear-3.csv 1m -l1 -o linear-3-predictions.csv
`

you should now see a “construction weight”
of around 8.25.
You don’t have anything to compare this with, yet,
but at least it doesn’t sound as silly as the
10^{10} we got for the perfect model.
Graphing
`linear-3-predictions.csv`
you should see something like

Again, it is piecewise constant, as Bear’s models always are. Indeed, Bear’s model here is like a decision tree on its single feature, where it has determined all the decision points at once. When we add more features the similarities with decision trees will remain evident, but so too will be the differences with how Bear’s algorithms determine the decision points for each feature.

It would be nice to be able to see Bear’s predictions on the same
axes as the input data.
The `memory_bear` program makes that easy, by using the
`--debug` flag:

`
$ memory_bear linear-3.csv 1m -l1 -d -o linear-3-debug.csv
`

Opening
`linear-3-debug.csv`,
you should see that the first 50 rows are just the original data,
with two extra
columns that I’ll return to shortly, followed by the 250
prediction rows.
If we graph just the first two columns, we get what we wanted:

We see that Bear has done a pretty good job of extracting out some piecewise constant dependencies, given the amount of data available and the amount of noise present.

But *is* this really the best that Bear could do under these
circumstances?
Apart from simply believing me that this is about as much that can be extracted
with statistical significance,
without any other *a priori* knowledge of the dependence of
the label on the feature,
we can also look at the *residuals* of this model.
This is where the two extra columns in debug mode are useful.
The third column just provides us Bear’s predictions for the
training examples:

and the fourth column provides the residuals of the training labels over these predictions:

Visually, this looks pretty convincing: there are no clear areas where a piecewise constant model would fit these residuals with any degree of statistical confidence.

We’ve seen that Bear has done a reasonable job of modeling noisy data with a linear dependence with 50 data points. But is that specific to the particular dataset that I created above? What if we change the random seed? For example,

`
$ simple_bear_tutorial_data linear-4.csv -r19660924
`

which creates the dataset

which actually looks a little “smoother”
than `linear-3.csv`.
(Of course, this is all just due to the random noise.)
Running Bear on this dataset,

`
$ memory_bear linear-4.csv 1m -dl1 -olinear-4-predictions.csv
`

we see in
`linear-4-predictions.csv`
that it now only decided to split the feature into *two* pieces:

In effect, Bear also “saw”
the “lumpiness” of the middle
portion of `linear-3.csv`, which wasn’t repeated
in `linear-4.csv`,
and deemed it sufficiently “lumpy” to create a piece there.
Bear doesn’t know if structure that it sees in the
input data is representative of the underlying relationship or just
random noise, just like we don’t (if we don’t look at
`simple_bear_tutorial_data` to learn how the
pseudorandom data was generated, of course!),
and forms its best guess
based on the statistical significance of what it does have.

But still, looking at the scatterplot above, we might wonder if Bear might not have squeezed out a third piece, since there is such an “obvious” linear variation in each of the two pieces it has. But if we look at the actual residuals of that model,

then it becomes less clear.
Certainly, there is not enough data to split these residuals into
a statistically significant piecewise model.
But that’s based on the two pieces that Bear actually found;
our question is whether it could have alternatively found *three*
pieces.
Even doing it by eye, it is difficult to see how Bear could have done this.
Moreover, note that Bear does *not* try every possible splitting
of the feature, not only because this would not be computationally tractable,
but also because the exponential explosion in the number of decisions would
hurt Bear’s ability to find statistical significance at all, since it
keeps track of the “multiple comparisons” problem.

We can also look at what happens when we add more training data. Let’s return to the original random seed, and specify that we want 1000 rows of training data rather than the default 50:

`
$ simple_bear_tutorial_data linear-5.csv -r19680707 -t1000
`

which creates

Running `memory_bear`,

`
$ memory_bear linear-5.csv 1m -dl1 -olinear-5-predictions.csv
`

now yields

where we now have a model with six pieces. The residuals again look reasonable:

Looking at them and the modeling above, you could *almost*
imagine breaking some of the pieces in half.
But that “always” is the point: there is just not enough
statistical significance in the amount of data we have for each piece to
overcome the inherent noise in the data.

Of course, if you add more and more data, there is more opportunity for
extra structure to be resolved despite the noise.
Using 10K data points gives you 8 pieces;
using 100K gives you 12 pieces;
using 1M gives you 25 pieces;
and using 10M gives you 54 pieces.
(This is easiest to see if you save the Bear model and inspect it
using `bear_model_details` in verbose mode.)

Incidentally, if you have looked at the help screen for
`simple_bear_tutorial_data` you will have seen that the default
underlying relationship between the label y and feature x is actually
y = 3 x + 10, which has been well modeled
by Bear.

In the real world you will often be missing data for some features for some examples. Bear handles missing feature values.

To see how this works, let’s create a dataset like
`linear-3.csv`, but with around half of the rows having a
missing feature value.
We can do this using the `--missing-percentage` option to
`simple_bear_tutorial_data`:

`
$ simple_bear_tutorial_data missing-1.csv -r19680707 -n50 -t100
`

where `-n50` sets this “missing percentage” to 50%.
I’ve also upped the total number of training rows to 100 so that
about 50 of them will still have feature data.
Indeed, if you inspect
`missing-1.csv`
you will see that there are label values for the first 100 rows,
but for 55 of them there is no feature value:

Note that the label values for examples with missing features are
clustered around 110.
This is because the `--missing-bias` default is 100, which is
an extra bias added to the label of all rows with a missing feature value,
in addition to the default `--bias` of 10, so that the expectation
value of the label for examples with a missing feature is 110.
(The default `--weight` of 3 does not come into play, because there
are no feature values to be correlated with for these examples.)

Usually, after the training examples we see the prediction examples. But in this file we see a row with no values at all:

This *is* actually a prediction row
(since its label is missing), but for the case when the feature
value is missing.
After this row are the standard 250 prediction rows that the program has
given us each time.

If you graph the 45 examples in missing.csv that do not have a missing
feature value, you will see that they follow the same general
pattern as `linear-3.csv` and `linear-4.csv`:

Running `memory_bear` on this data,
and saving the Bear model file,

`
$ memory_bear missing-1.csv -dl1 1m -omissing-1-predictions.csv -smissing-1
`

we see from
`missing-1-predictions.csv`,

that the prediction for a missing feature value is almost 110, and if we graph the predictions for the examples without missing feature values,

that Bear has modeled these similarly to the datasets above without missing feature values.

You might have noticed Bear quoting its “construction weight” as over 761! Again, we haven’t yet discussed what these “weights” actually are, quantitatively, but 761 seems significantly better than the single-digit weights previously noted. We can get some insight into what is going on here if we inspect the model file, in verbose mode:

`
$ bear_model_details missing-1 -v
`

There is a fair bit of detail in the output, but if you read it from the bottom, you will see two models listed in a parent–child chain.

First, there is the empty model, which makes a constant prediction of around 64.5. This the mean value of all labels in the input dataset.

Next is a model with feature 0. Bear says the “completeness” model is nontrivial. This models whether each example is “complete,” i.e., does not have any missing feature values. In this case, the labels for examples with missing value were found to be statistically significantly different from those with supplied values. It predicts around 45.3 on top of the empty-model prediction of around 64.5 for examples that are incomplete, yielding an overall prediction of around 109.8, and subtracts around 55.3 from the empty-model prediction of 64.5 for examples that are complete, yielding a prediction of around 9.2.

After that, Bear tells us that the “complete model” is nontrivial. This models the residuals of the completeness model above, using the feature value, for just the complete examples (because it doesn’t have any feature value for the incomplete examples!). Its two piecewise-constant pieces are what is shown in the graph above.

When Bear computes a “weight,” it is always normalized by reference to that of the empty model. Here the empty model is quite bad (but the best that can be done without any features): all of the actual label values are far above or below its constant prediction of around 64.5. The completeness model actually provides most of the improvement in this particular case, and the complete model against the feature provides some further improvement, ultimately giving the construction a “weight” of over 761, compared to the empty model.

The example above shows that Bear can handle missing feature values without
needing to discard either features or examples.
Let’s simplify the dataset so that we can see more clearly what
Bear is doing.
The dataset
`missing-2.csv`
has statistically significant frequencies like we had in
`linear-2.csv`, but with just three feature values and
corresponding label values, plus a missing feature:

Running Bear on this dataset,

`
$ memory_bear missing-2.csv -dHLlabel -fCfrequency 1m
-omissing-2-predictions.csv -smissing-2
`

we see that the predictions are perfect, so that the residuals are all zero:

Looking at the model, we see that the empty label prediction is 70 (the mean of the labels), the completeness model is nontrivial, predicting an additional 50 for incomplete examples and −50 for complete examples, and the nontrivial complete model predicts an additional −10, 0, or +10 for the complete examples.

Let’s now change the missing example label:

Running Bear on this,

`
$ memory_bear missing-3.csv -dHLlabel -fCfrequency 1m
-omissing-3-predictions.csv -smissing-3
`

the predictions are still perfect. The model file shows that the incompleteness model is still nontrivial, but the incomplete and complete predictions are both zero. How can that be?

The answer is that the distribution of residuals is statistically significantly different for incomplete and complete examples, but it just so happens that the mean of each is zero. If we change the distribution of label values for the incomplete examples to exactly match that of the complete examples:

and run Bear on it:

`
$ memory_bear missing-4.csv -dHLlabel -fCfrequency 1m
-omissing-4-predictions.csv -smissing-4
`

then we see that not only is the model no longer perfect, for the incomplete examples:

but the model file now indicates that the completeness model is trivial.

Now consider the dataset

In this case the completeness model is nontrivial, but the complete model is trivial (as it must be, since there is only one distinct feature value). Finally,

doesn’t have enough statistical significance for a nontrivial complete model, and the completeness model is trivial because the distribution of the incomplete examples is identical to that of the complete examples, and so Bear found the model using the feature to be trivial overall, and hence discarded it, leaving just the empty model.

Note that my libraries automatically
handle text files that are compressed with `gzip`.
All that you need to do is specify a filename that ends in `.gz`,
and it will all happen automagically.
The command `gzcat` is a useful analog of `cat` for
such files.
Note that Bear always saves its model file in compressed format.

If you have followed along with (and hopefully enjoyed) all of the above, then feel free to move on to the intermediate tutorial.

© 2022–2024 John Costella