Contents
- Index
TOOLS > CLUSTERING > OPTIMISATION
PURPOSE Optimizes a cost function which measures the total distance or similarity within classes for a proximity matrix.
DESCRIPTION Given a partition of a proximity matrix of similarities into clusters, the program finds a partition with K classes that maximizes a fit criterion. Different options are available for measuring fit. The default option (correlation) maximizes the correlation between the data matrix X and a structure matrix A in which a(i,j) = 1 if nodes i and j have been placed in the same class and a(i,j) = 0 otherwise. Thus, a high correlation is obtained when the data values are high within-class and low between-class. This assumes similarity data as input. For dissimilarity data, the program maximizes the negative of the correlation. Another measure of fit is the density function, which is simply the average data value within classes. There is also a pseudo correlation measure that seeks to measure the difference between the average value within classes and the average value between classes. The routine uses a tabu search combinatorial optimization algorithm.
PARAMETERS
Input dataset:
Name of file containing proximity matrix to be clustered. Data type: Square symmetric matrix.
Number of clusters: (Default = 2)
Number of clusters into which the actors must be assigned.
Fit criterion
Density the average value within clusters for similarity and the sum for distance data.
PseudoCorrelation a simple fast correlation measure between the clustered data and the ideal structure matrix.
Correlation the Pearson correlation measure between the clustered data and the ideal structure matrix.
Are diagonal values valid? (Default = No)
Whether diagonals are to be included on the cost function.
Type of Data:
Similarities causes large values to be clustered together. Distances causes small values to be clustered together.
Max # of iterations in a series: (Default = 12)
The algorithm starts from an arbitrary partition and attempts to decrease the cost by taking the steepest descent. If the cost cannot be reduced then the algorithm continues its search in the neighborhood of the current partition. This search direction is a mildest ascent direction and from there new search directions are explored. This exploration only continues for a fixed number of iterations in a series. If no improvement is made after the fixed number of iterations the algorithm terminates with the current minimum. Increasing the parameter gives a more exhaustive and therefore slower search. The recommended default value is automatically entered on the form once the input data has been selected.
Length of time in penalty box: (Default = 5)
If the algorithm makes an ascending step then it is possible that the best possible descending step is the reverse of the direction just taken. This parameter prohibits a move along the reverse direction for a set number of steps. The larger the value the more difficult it will be to come back to a previously explored local minimum, however it will also be more difficult to explore the vicinity of that minimum. The default has been shown experimentally to be the most useful.
Number of random starts: (Default = 3)
The whole procedure is repeated with a different initial partition. The best of these are then selected as a minimum.
Random Number Seed:
The random number seed generates the initial partition. UCINET generates a different random number as default each time it is run. This number should be changed if the user wishes to repeat the analysis with different initial configurations. The range is 1 to 32000.
Output Partition Dataset: (Default = 'TabuCluster').
Name of output file which contains a partition indicator vector. This vector has the form (k1,k2,...ki...) where ki assigns vertex i to block ki, so that (1 1 2 1 2) assigns vertices 1, 2 and 4 to block 1, and 3 and 5 to block 2. This vector is not displayed at output.
LOG FILE The value of the cost function.
List of clusters. Each cluster is labeled and is specified by the vertices it contains.
The blocked proximity matrix. The rows and columns of the original matrix are permuted into clusters. The proximity matrix is displayed in terms of the matrix clusters it contains.
TIMING Each iteration of the tabu search algorithm is O(N^2).
COMMENTS Care should be taken when using this routine.
The algorithm seeks to find the minima of the cost function.
In addition there may be a number of alternative partitions which also produce the minimum value; the algorithm does not search for additional solutions. Finally it is possible that the routine terminates at a local minima and does not locate the desired global minima.
To test the robustness of the solution the algorithm should be run a number of times from different starting configurations. If there is good agreement between these results then this is a sign that there is a clear split of the data into the reported blocks.
REFERENCES Glover F (1989). Tabu Search - Part I. ORSA Journal on Computing 1, 190-206.
Glover F (1990). Tabu Search - Part II. ORSA Journal on Computing 2, 4-32.