Composite RGB Image¶
Both SHG and PL imaging techniques produce greyscale images, so a way to combine them into a composite RGB data format must be developed first.
We create a composite 3 channel image from the SHG, PL and PL transmission data, corresponding to red, blue and green channels respectively, taken at a fixed biopsy region. Each RGB pixel vector is then normalised into a unit vector, in order to reduce the dependence on lighting intensity of each image type. Therefore we expect fiborous features to show up as unit vectors with large red components, whereas cellular features should result in unit vectors containing prominently large green and blue components.
Colour Clustering¶
After identification of the main 8 colour clusters present in our RGB composite image, we then proceed to assign whether these clusters should be considered either cellular or fibrous, based on their centroid unit vector \(\mathbf{c}\) and average non-zero intensity \(\bar{I}\) values. We consider the cluster to contain cellular features if it contains a non-zero value of \(\phi(\mathbf{c}, \bar{I})\). The terms in \(\phi(\mathbf{c}, \bar{I})\) consist of 3 angles on the RGB colour sphere corresponding to the unit vector \(\mathbf{c}\) as well as \(\bar{I}\). We define a simple vector \(v(r, g, b, i)\) (equation (1)) that indicates the boundary between cellular and fibrous domains; a cluster is only considered part of the cellular region if each component has a value lower than this boundary.
The pixels present in these accepted clusters are then combined to form our binary “”cell filter””, which we use as the basis for further image segmentation. It should be noted that the default values of \(v(r, g, b, i)\) used in our software have determined heuristically, based on available data and are therefore relative, rather than absolute, units. There are not expected to produce consistent behaviour for images from different sources.
In order to deal with the stochastic nature of the KMeans algorithm we rank each cell cluster proportional the L1 distance of all centroids from \(v(r, g, b, i)\), resulting in the cost function \(\Psi\) (equation (2)). The KMeans run that creates the lowest average \(\Psi\) is then chosen as the optimal solution for our cell filter.