# Generating clustering problems

These problems are generated using the fortran program initpGP.f. This program calls random.f as a subroutine. You can compile them using
f77 initpGP.f random.f -o initpGP
for example. If you are using SunOS 5.6 (Solaris), you can download the executable directly.

A sample input file is contained in fort.17; this produces the output contained in in40a. The output produced by my cutting plane code for this problem is contained in out40a. In general, an input file must be placed in fort.17, and the generated problem will be sent to the standard output.

The code generates some observations, each with various characteristics. The input file fort.17 contains these two parameters and also a parameter for a random seed. The first three rows of output returns the value of these three parameters. The remaining rows contain the observations and their characteristics. Each characteristic can take the value 0, 1, or 2. It is desired to identify observations that are similar to one another.

The method we use to formulate this problem as an integer programming problem is due to Grotschel and Wakabayashi:

```    AUTHOR  =  {M. Gr{\"{o}}tschel and Y. Wakabayashi},
TITLE   =  {A cutting plane algorithm for a clustering problem},
JOURNAL =  {Mathematical Programming},
YEAR    =  1989,
VOLUME  =  45,
PAGES   =  {59--96}
```
The observations can be regarded as vertices of a graph. The length of the edge connecting two graphs is given by
(Number of observations in which they disagree) - (Number of observations in which they agree)
Thus, in the file out40a, the distance between observations 1 and 2 is (4-8)=-4.

We then wish to choose a subset of the edges such that:

1. The cost of the chosen edges is as small as possible.
2. The chosen edges correspond to the edges of cliques, that is, if edges (i,j) and (j,k) are chosen then edge (i,k) must also be chosen.
The objective is to choose a cut that minimizes the sum of the weights of the edges appearing in the cut. Every edge has cost +1 or -1, evenly distributed.

Seeds and optimal values for selected problems are available:

A paper describing the problem in more detail and giving computational results is "Computational experience with an interior point cutting plane algorithm."

Back to my homepage.