One nearest neighbor with compression

There are two ways to run the project. The first one would be to use the installation which adds a command to your shell called "nn_condensing". This option is less flexible and is not recommended since it was not maintained properly. The second option is to define your own metric for you data. The reading of the data is done outside this project and is under your control. The only thing this project needs, is for you data objects to work with your metric function.

First to install the project, download(from repository) it and open a shell from the project's folder. The command to install is python setup.py install.

To use the project after installation you need to use a shell command that looks like this:

nn_condensing data_path=E:\nn_condensing_output\Skin_NonSkin.txt \
              starting_train_data_size=125 test_data_size=200 train_data_upper_limit=1000 \
              metric=L2 train_data_size_inc_func=geometric train_data_size_inc_arg=2 \
              CSV_path=E:\nn_condensing_output\CSV_All_Data.CSV

Here is a full list of parameters for this command including their default values and explanation :

data_path='E:\\1-NN repo\Data\Skin_NonSkin.txt'
             Full path of the data
starting_train_data_size=1000
                               Training data initial size
test_data_size=200
                                          Test data size
train_data_upper_limit=3000
                                 Training data final size limit
metric="L2"
                                                 Metric to be used.
train_data_size_inc_func="geometric"
                        How data size increases. "geometric"/"linear"
train_data_size_inc_arg=2
                                   The data size increase argument.
compress_greedy= True
                                       Test compression with greedy algorithm?
compress_heirarchy=True
                                     Test compression with fast net algorithm?
test_memory=True
                            Test memory consumption (only works with fast net compressino right now)
data_pruning_greedy=True
                                    Prune the data after greedy compression
data_pruning_hierarchy=True
                                 Prune the data after hierarchy compression
test_hierarchy_compression=True
                             Run 1-nn on the full train data using the fast compression result
test_1_nn_full_data=True
                                    Run 1-nn with the full training data (no compression)
test_1_nn_greedy=True
                                       Run 1-nn with compressed data (greedy compression)
test_1_nn_hierarchy=True
                                    Run 1-nn with compressed data (fast compression)
test_1_nn_greedy_pruning=True
                               Run 1-nn with pruned data (greedy)
test_1_nn_hierarchy_pruning=True
                            Run 1-nn with pruned data (hierarchy)
generate_CSV=True
                                           Write a CSV file with the results
CSV_path=''
                                                 File's location.
debug_info=True
                                             Print status messages while running the experiment.
print_tables=True
                                           Finish the experiment by printing the output into tables

As an example of how you could run it, for the skin database we need to implement a metric and a reader function.
Mountain View
Once these two functions are written, you can run the run_experiment function. Note that the run_experiment is using the "skin_line_parser" so you need to replace it directly in the code (This is an implementation mistake and should be changed).

The preffered way to work with the algorithm is to provide it with a matrix of data items, and a matrix of precalculated Gram matrix.
For example this line is calculating an epsilon net using the hierarchy technique. "gram_matrix" is a precalculated distance matrix of the "data_sample" data to use. "markov_distance" would be the distance to use between data items. If a Gram matrix were not provided, the markov_distance would have been used to calculate it.

nn.epsilon_net_hierarchy(data_sample,markov_distance,gram_matrix=gram_matrix)

This line is an example of how to use the output of the previous line (hierarchy_net) to produce an even smaller net using the consistent pruning algorithm.

nn.consistent_pruning(hierarchy_net,markov_distance,gram_matrix=gram_matrix)

Finally, this line is executing the nearest neighbor algorithm using the pruned net as the known data sample to classify the unknown data set.

nn.execute(pruned_net,unknown_data,markov_distance)

One nearest neighbor with compression

Written by Yevgeni Korsunsky,Aryeh Kontorovich

1. Introduction

2. The nearest neighbors condensing problem

3. Constructing $\text{[math]}$ -net

Hierarchy e-net algorithm:

Pruning algorithm:

4. The code

1-NN condensing / How to use 1-NN condensing project

5. Experimental results

6. References

One nearest neighbor with compression

Written by Yevgeni Korsunsky,Aryeh Kontorovich

1. Introduction

2. The nearest neighbors condensing problem

3. Constructing -net

Hierarchy e-net algorithm:

Pruning algorithm:

4. The code

5. Experimental results

6. References

3. Constructing $\text{[math]}$ -net