
NEW: 
Introduction to SOMKohonen Networks: SelfOrganizing MapsThere is no better introduction than Kohonen's himself : http://www.cis.hut.fi/projects/somtoolbox/theory/somalgorithm.shtml An excellent thesis document, fairly easy to read, and very comprehensive on the topic is available from our site: Som Thesis (Requires a PostScript Reader like the free GhostScript GSView). For those who just want a shorter introduction, let's say that SOM is a competitive neural method which arranges a small group of "codebook" vectors in such a way that such vectors keep salient features of the original picture. This clustering method is actually a projection of the probability of the density function over a 2D map. Let's try to describe it simply. Let's say that you have a data set made of historical records of n technical indicators (5, 6, 10 or more). Each such record represents a market pattern of your choice, and is mathematically just one point in a ndimensional hyperspace. We all understand such representation is pretty difficult to visualize in the first place, and to try and find recurrences of such patterns across your data set is not any easier. This is where SOM comes into play. Apart from some convergence settings, the main decision is to define how many representative points (codebook vectors) one thinks would adequately represent your data set. More points will facilitate the learning process, but will also increase computing time. Let's say that we choose a 4x5 grid. These points form a so called 2D map, and SOM projects all records in your data set against that 1D or 2D map. While the convergence process is running, the map itself changes continuously. What do we get at the end of the convergence process? A 1D or 2D map of those representative points, as we just said. Each point in that map (4x5 in our example) represents a cluster of points in your original data set, so one way to look at it is that the process has compressed the data onto a map that is easier to handle than the original data set. Another way to look at it is as a classifier. Each cluster center, i.e. point of the map, represents groups of points of the original data set which fall "naturally" as the process progresses. One can therefore use the coordinates of the cluster center, or bin, to segregate our original patterns. In doing so, one has very substantially reduced the complexity of our original data set. What else can we deduce from such technique? SOM still behaves quite well with missing or discordant data. If for instance, one has fictitious patterns part of a given cluster, like these:
It is likely that outliers will be overcome by the process. This makes SOM a good "natural" fuzzy classifier, natural in the sense that it is not rulebased like in fuzzy logic, and is on the contrary a feature extraction device.
To be continued...

Today:
 Page last modified:
December 08, 2007 