# Image Segmentation w/ EM Algorithm

## Overview

It is possible to segment images using a clustering method, where each segment is the cluster center to which a pixel belongs. Images are represented by their r,g, and b values. In this implementation image pixels are clustered by applying an Expectation Maximization (EM) algorithm to a mixture of normal distribution model. Then the image is segmented by mapping each pixel to the cluster center with the highest value of the posterior probability for that pixel. The results of segmentation can be displayed easily as simple images, where each pixel’s color is replaced with the mean color of the closest segment. This is the same basic technique used for some image compression, as storage only involves mapping pixels to a limited set of colors (cluster centers).

## EM Algorithm (Concept)

A normal distribution is capable of modelling a single blob of data points. But a single normal distribution will not produce a good model if there are multiple blobs of data points. The multivariate normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. It is often used to describe a set of (possibly) correlated components (random variables) each of which should cluster around a mean value. With the help of affine transformation, any multivariate gaussian can be converted to have zero mean and independent components. Therefore, under the right transformations, any gaussian can be represented as the product of one-dimensional normal distributions (each with zero mean and unit standard deviation). This permits a way to simulate multivariate normal distributions. Simply simulate a standard normal distribution for each component, applying the appropriate transformations afterwards.

Extrapolating this idea to clustering, think of the data as if it were generated in a certain way and then figure out the specifics of that process. Assume that each data item was pulled from one of multiple distinct models. For k clusters, assume there were k distinct models. Assume that a magical process determined which model to use when producing a data item in question. We know neither the models nor the origin of each data item (it’s classified cluster). This is a very common situation, applicable to many types of problems.

A natural approach to solving this problem is to alternate between estimating classifications of data items to models, and re-estimating the models themselves. For example, K-means iteratively assigns points to clusters and then re-estimates the cluster centers. But what if the models are probabilistic or there is not a clear “distance function”. In this case an Expectation Maximization (EM) algorithm is the standard.

At a high level, the EM algorithm works to average out what we don’t know. It alternates between averaging and re-estimating parameters with soft methods. The E step (expectation) involves creating a function for the expectation of the log-likelihood evaluated using the current parameter estimates. The M step (maximization) involves re-computing the parameters to maximize the function created in the E step (the expected log-likelihood). For the next iteration, the E step creates the log-likelihood using the newer parameter estimates and so on. The algorithm repeats until acceptable convergence.

## Results

Results were generated by segmenting 5 different images to 10,20 and 50 segments. The resulting pixel to cluster mappings were stored as images after each iteration and the results were compiled into animated GIFs. Additionally, one image (polar lights) was segmented into 20 segments using 6 different random initialized values fed into the EM algorithm.

## Exploring Variation in Results

To keep everything consistent, the EM algorithm was run for a fixed 30 iterations on the polarlights image using 6 different initial conditions for “pi” and “mu”. For each of the 6 runs, the initial values for pi were randomized and then unitized so they summed to 1. The same process was used for each topic’s mu vector. It is easiest to detect origins of variation by looking at the progression of iterations in the animated gifs. Although each run converges on very similar segmentations, there are slight differences. Depending on the random initial values for pi and mu, the first assignments of pixels to segments is different for each of the 6 trials. Certain colors appear in the early iterations and then morph more towards the original image by later iterations. Obviously the center shifts to shades of green, the right blues, and the left purples. These generalities hold across runs despite the difference in initial conditions, as they are the “ground truth” assignments looking at the original image. The variance in the final images reflects the fact that the EM algorithm converges to find parameters that will obtain local maxima and likelihood of the data given the model’s parameters. (LOCAL maxima not GLOBAL maxima). Intuitively, the process will find different maximizations given different initial conditions. If one cluster is assigned a high Pi value originally, it will be relatively easier for this cluster to achieve assignments… while for a cluster that was originally assigned a low Pi value, it will struggle to achieve assignments (relatively).

## Implementation

All of the code was written from scratch in Matlab. It has been included in its entirety below. I’m sure that further optimizations will be apparent to those experienced with vectorizing code, but this worked well enough for my purposes. A few notes are worth mentioning… While conceptually the EM algorithm involves selecting parameters that maximize an entire function, the only values required for successive iterations are the weights and the parameters themselves. Instead of calculating the full equation, the weights are recalculated and the parameters are adjusted each iteration. Here the pi’s (probability that a given cluster was chosen for data point creation) and mu’s (mean value for each model/distribution) are initialized to values that are close to even, plus a small amount of gaussian random noise. Shown later is a snippet to initialize the pi’s and mu’s in a uniform random fashion. The E step contains terms that may lead to underflow errors because of the multiplication of many tiny numbers. To avoid underflow errors, the implementation performs the E step in the log domain where a product of k terms becomes a sum of k terms. Unfortunately the denominator of one element in the equation contains a sum of products, meaning I couldn’t decompose it into a sum of logs of individual terms. Instead I compute the logarithm of this expression for sums of products. One could use a log sum expression command to accomplish this, but I did it from scratch. I’ll try to elaborate on this below, with an example from the topic model case (see INSERT LINK). The same logic applies to this implementation. Consider the following expression:

\[p\left ( \delta _{ij}=1|\theta ^{(n)},x \right )=\frac{[\prod _{k}p_{j,k}^{xk}]\pi_{j}}{\sum _{l}[\prod _{k}p_{l,k}^{xk}]\pi_{l}}\]To avoid underflow, this will have to be computed in the log domain. But that denominator is a sum of products which doesn’t decompose into a sum of logs of individual terms. Instead, compute the logarithm of sums of products. For easy notation, denote:

\[\begin{align*} Y &= p\left ( \delta _{ij}=1|\theta ^{(n)},x \right )\\ A_{j} &= [\prod _{k}p_{j,k}^{x_k}]\pi_{j}\\ A_{max} &= \max_{j} A_{j} \end{align*}\]Therefore,

\[\log A_{j} = \log \pi_{j} + \sum_{k} x_{k}\log p_{j, k}\\ \log A_{max} = \log\left(\max_{j}A_{j}\right) =\max_{j} \log\left(A_{j}\right)\]Then, derive and calculate the logarithm of Y as:

\[\begin{align*} \log Y &= {\log A_{j}} - {\log\left(\sum_{l} A_{l}\right)}\\ &={\log A_{j}} - {\log\left(\sum_{l} e^{\log A_{l}}\right)}\\ &={\log A_{j}} - {\log\left(e^{\log A_{max}}\sum_{l} e^{\log A_{l} - \log A_{max}}\right)}\\ &={\log A_{j}} - {\log e^{\log A_{max}}} - {\log\left(\sum_{l} e^{\log A_{l} - \log A_{max}}\right)}\\ &={\log A_{j}} - {\log A_{max}} - {\log\left(\sum_{l} e^{\log A_{l} - \log A_{max}}\right)} \end{align*}\]Then, simply take the exponent of that to yield Y. This means that each iteration, the E step will follow the following formula:

For every data item “i”, do:

- Compute log(Aj) for every j
- When performing step 1, note the biggest computed value (considering each final log(Aj) ) and remember it as log(Amax)
- Because it is identical for all j, compute the following term from the final equation for logY:

- For each cluster j, compute the logY
- Exponentiate this logY ie: exp(logY) to yield w(i,j)

The M step will change the values of Aj, so each E step this will be an effective computation of new weights corresponding to the assignments of data points to clusters.

```
% Clear the workspace
clear all; clc;
% Set the workspace
cd '/directoryWithImageNamesGoHere'
% Potential image names
imgNames = {'balloons', 'mountains', 'nature', 'ocean', 'polarlights'};
segmentCounts = [10,20,50];
for imgNameCounter = 1:5
for segmentCountIter = 1:3
% Load the image
x = imread([imgNames{imgNameCounter}, '.jpg']);
%Define Parameters
nSegments = segmentCounts(segmentCountIter);%10 % # of color clusters
nPixels = size(x,1)*size(x,2); % # of pixels
maxIterations = 15; %maximum number of iterations allowed for EM algorithm.
nColors = 3;
%Determine the output path for writing images to files
outputPath = ['output/', num2str(nSegments), '_segments/', imgNames{imgNameCounter} , '/'];
imwrite(x, [outputPath, num2str(0), '.png']);
%reshape the image into a single vector of pixels for easier loops
pixels = reshape(x,nPixels,nColors,1);
pixels = double(pixels);
% Initialize pi vector and mu matrix
% Vector of probabilities for segments... 1 value for each segment.
% Best to think of it like this...
% When the image was generated, color was determined for each pixel by selecting
% a value from one of "n" normal distributions. Each value in this vector
% corresponds to the probability that a given normal distribution was chosen.
pi = repmat(1/nSegments, nSegments, 1);
%add noise the initialization (but keep it unit)
for j = 1:length(pi)
if(mod(j,2)==1)
increment = normrnd(0,.0001);
pi(j) = pi(j) + increment;
else
pi(j) = pi(j) - increment;
end
end
% Matrix of mean color (RGB vec)for the "n" distributions that will end up yield "n" image segments
mu = repmat(1/nSegments,nSegments,nColors); %for even start
%add noise to the initialization (but keep it unit)
for j = 1:nSegments
if(mod(j,2)==1)
increment = normrnd(0,.0001);
end
for k = 1:nColors
if(mod(j,2)==1)
mu(j,k) = mean(pixels(:,k)) + increment;
else
mu(j,k) = mean(pixels(:,k)) - increment;
end
if(mu(j,k) < 0)
mu(j,k) = 0;
end
end
end
mu_last_iter = mu;
pi_last_iter = pi;
for iteration = 1:maxIterations
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% ----------------- E-step --------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
display('E-step');
% Weights that describe the likelihood that pixel "i" belongs to a color cluster "j"
w = repmat(1,nPixels,nSegments); % temporarily reinitialize all weights to 1, before they are recomputed
% logarithmic form of the E step.
% ...
% ...
for i = 1:nPixels
% Calculate Ajs
logAjVec = repmat(0,1,nSegments);
for j = 1:nSegments
logAjVec(j) = log(pi(j)) - .5*((pixels(i,:)-mu(j,:))*((pixels(i,:)-mu(j,:)))');
end
% Note the max
[logAmax,ind] = max(logAjVec(:));
% Calculate the third term from the final eqn in the above link
thirdTerm = 0;
for l = 1:nSegments
thirdTerm = thirdTerm + exp(logAjVec(l)-logAmax);
end
% Calculate w(i,j)
for j = 1:nSegments
logY = logAjVec(j) - logAmax - log(thirdTerm);
w(i,j) = exp(logY);
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% ----------------- M-step --------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
display('M-step')
%temporarily reinitialize mu and pi to 0, before they are recomputed
mu = repmat(0,nSegments,nColors); % mean color for each segment
pi = repmat(0,nSegments,1);
%new version
for j = 1:nSegments
denominatorSum = 0;
for i = 1:nPixels
mu(j,:) = mu(j,:) + pixels(i,:).*w(i,j);
denominatorSum = denominatorSum + w(i,j);
end
% Update mu
mu(j,:) = mu(j,:) ./ denominatorSum;
% Update pi
pi(j) = sum(w(:,j)) / nPixels;
end
display(pi')
muDiffSq = sum(sum((mu - mu_last_iter).^2));
piDiffSq = sum(sum((pi - pi_last_iter).^2));
if (muDiffSq < .0000001 && piDiffSq < .0000001)
disp('Convergence Criteria Met at Iteration:')
disp(iteration)
break;
end
mu_last_iter = mu;
pi_last_iter = pi;
% Draw the segmented image using the mean of the color cluster as the
% RGB value for all pixels in that cluster.
segpixels = pixels;
cluster = 0;
for i = 1:nPixels
cluster = find(w(i,:) == max(w(i,:)));
segpixels(i,:) = (mu(cluster,:));
end
segpixels = reshape(segpixels,size(x,1),size(x,2),nColors);
segpixels = segpixels ./255; %normalize each entry to make img of type double
imwrite(segpixels, [outputPath, num2str(iteration), '.png']);
end
% clean up workspace
clear mu_last_iter pi_last_iter i j k l denominatorSum logAjVec;
clear logAjmax logAmax logY muDiffSq piDiffSq;
clear thirdTerm;
end
end
```

For the random initializations used for the exploration of variation, the following snippets were used.

```
% Initialize pi vector and mu matrix to random values (unitized)
% Vector of probabilities for segments... 1 value for each segment.
% Best to think of it like this...
% When the image was generated, color was determined for each pixel by selecting
% a value from one of "n" normal distributions. Each value in this vector
% corresponds to the probability that a given normal distribution was chosen.
pi = rand(nSegments, 1); %repmat(1/nSegments, nSegments, 1);
pi = pi./sum(pi);
% Matrix of mean color (RGB vec)for the "n" distributions that will end up yield "n" image segments
mu = rand(nSegments,nColors);%repmat(1/nSegments,nSegments,nColors); %for even start
for j = 1:nSegments
mu(j,:) = mu(j,:)./sum(mu(j,:));
end
```

## Comments