3-rd Conference
Computers in Chemistry ‘94
June 23-26, 1994 - Wroc³aw, Poland

A Computer Program “Cluster” for Cluster Analysis

P. Kêdzierskiª , K. Kowalª , P. Wojciechowskiª , R. Gancarz§

ª Science Department,

§ Institute of Organic Chemistry, Biochemistry and Biotechnology, Wroc³aw University of Technology

Cluster analysis is a technique used for comparisons of a large data set and is used for example in computer recognition of images or patterns or in the analysis of compounds of biological activity. There are lots of techniques used in order to find a compound of desired activity, and the statistical methods are very popular because of great numbers of data. In fact, the most reliable method is to synthesise a couple of thousands of compounds similar to the chosen leader substance and then to investigate their properties. But it is very costly and time demanding technique. Thus any method which helps in minimising the number of synthesised compounds is of great value. Using the computer program like this one can get an idea which are the most important factors for expressing the biological activity and one can find what the next synthesed substances should be like. Our program is designed to help solve these problems.

Functions of the program:

  1. Data input - Data input is as simple as possible. The data are represented as a points in a multi-dimensional space. Each point corresponds to different compound and each dimension represents chemical, physical and biological properties of a substance.
  2. Data processing - It is made by two procedures: ScaleData and MakeTree. After input the co-ordinates may be rescaled. At present, the program offers three possibilities: processing without scaling, linear scaling to range of 0..1 and scaling to Gauss distribution N(0,1). It is quite simple to add more options. Next the points are computed inside a loop to fit then into a dynamical tree structure. In the beginning each point forms a separate cluster. At each step the two closest points are joined into a point and the number of clusters is diminished by one. This operation is repeated until all the points make one cluster. Program offers several methods of calculating new co-ordinates of calculating new co-ordinates of clusters as well as a measure of distance. The tree structure made at this point exit in memory in the form of tree.
  3. Walking on the tree - The interactive dialogue with computer is possible in other to divide the data to a specified number of clusters or to obtain an information on a number of distinct clusters. The results can be presented in a graphic form.


Click here for short instruction in Polish language.
If you want test our program, please contact with Pawe³ Kêdzierski.