Hakell and Image processing

I love to do image processing and Haskell and wanted to put these two things together in an “effective” way.

Why do I want to program in Haskell? The answer is simple: I love the declarative style of functional programming. I can directly write my formulas into code. I waste less time doing rote jobs and think at a higher level, so it is good for my grey cells as well.

Also, I hate the fact that I have to manage the memory by myself in C / C++. My biggest problem with C is that, I am never at peace with my code. There might be a memory leak somewhere, a global variable might not have been updated and all that crap. My mind twists and turns in trying to reason about code. I just wanted a language where I would write code and just forget about it and it works just fine 🙂

The problem with doing image processing in C is that, using the default Haskell data types (lists, functions) makes the program too slow. One has to expect such speed hits as this is a very high level programming language. However, there are a few array libraries that speed up the code. The sped up code might not compete with C but it is reasonably fast.

I have used the hmatrix library for representing an image. What I like about this library is that there are a whole lot of lapack algos that can be implemented on them. This library allows one to write purely functional code. Just write the image as a 2D function( func (i,j) ) and use buildMatrix, voila! you have a new image.

Hmatrix was just perfect for the array library. I still had lot of problems with interfacing with my web camera. I understand that there is a HOpencv library which can do that but after glancing through the doumentation I felt that the API requires me to stay inside the IO monad. Which means the same stateful programming that I much abhor. Added to that it did not compile on my system and I had to find another way to grab images.

There seem to be nice examples on internet for grabbing images from web cameras (using v4l2 in linux) and I modified a C source and wrote FFI bindings to it. I am truly elated to find that the FFI bindings to v4l2 based image grabbing and the hmatrix work seamlessly :).

If any of you require the bindings reply and I will try to post them.

Convolution

Convolution is mathematically well understood, both in the continuous and discrete domains. There is a simple formula to comupte it. However, I recently needed to implement this in code and found that it was not all too easy. In this post, I tried to come up with a formula that can be used for programming convolution. In later posts about this, I will also try to analyze the computational complexity and spruce it up with some real working convolution code.

Firstly, what is discrete convolution? There is a straight-forward formula :

\displaystyle  h[n] \equiv (f*g)[n] = \sum_{k=-\infty}^{k =\infty} f[k]g[n-k]

For generality we will assume that {f} and {g} are discrete functions (i.e. functions from the Integers to the Reals). For example, {f} might be such that it is defined from {fl} to {fh} and it is zero outside. This is the most general representation for the signal. Similarly, let {g} be defined on {[gl,gh]} and zero outside. When we are asked to convolve these two signals, it is apparent that {fl,gl,fh,gh} would play some role in the computation. However, MATLAB for example, does not care about these values. So, it is slightly confusing how convolution is carried out. Let us examine.

In the definition of convolution, what are the values of {n} that are interesting? Before thinking about that, it is better to understand what values of {k} are important for our considerations. Fix any {n} and note that

  • { fl \leq k \leq fh}, because outside of this range, {f[k]} is zero and hence does not add to the convolution
  • Similarly, {gl \leq n - k \leq gh} which implies that {n - gh \leq k \leq n - gl}

Clearly, if {[fl,fh] \cap [n-gh, n-gl] = \phi}, the convolution is zero and we do not have to care about such an {n}. Otherwise, one can immediately see that the interesting values of {k} are :

\displaystyle [fl,fh] \cap [n-gh, n-gl] = \max (fl, n- gh) \leq k \leq \min (fh, n - gl)

Let us rewrite the formula again (for those {n} such that {[fl,fh] \cap [n-gh, n-gl] \neq \phi}):

\displaystyle  h[n] \equiv (f*g)[n] = \sum_{k= \max (fl, n- gh)}^{\min (fh, n - gl)} f[k]g[n-k]

Now, what are the values for {n} for which {[fl,fh] \cap [n-gh, n-gl] \neq \phi}? Consider, {  n = fl + gl}, then

\displaystyle  \begin{array}{rcl}  [fl,fh] \cap [fl+gl-gh, gl+gl-gl]  &=& \max (fl, fl+gl- gh) \leq k \leq \min (fh, fl+gl - gl) \\ 		 		 &=& fl \leq k \leq fl \\ 				 &=& fl \end{array}

When {n < fl + gl}, {\min (fh, n - gl)  < fl} and hence the convolution is zero. The upper limit on {n} can similarly be found to be {n = fh + gh}. Therefore, the full definition of convolution useful for computer implementation is:

\displaystyle  h[n] \equiv (f*g)[n] = \sum_{k= \max (fl, n- gh)}^{\min  (fh, n - gl)} f[k]g[n-k] ~~\forall n \in [fl+gl,fh+gh]

Upon further thought, it can be easily seen that {fl,fh,gl,gh} are not important for the core convolution. If the signal {(f, [fl,fh])} is replaced by {(a, [0, fh-fl])} and {(g,  [gl,gh])} is replaced by {(b, [0,  gh-gl])} there would no difference in the computation of the main convolution function. Only the range of {n} for which the convolution is defined change. This is the primary reason why MATLAB does not care about these things.

Siddharta

I have been reading (actually listening to) the book Siddharta by Hermen Hesse. I did like this book as I know that when I listen to something for a couple of hours continuously, it has captured my attention.

Briefly, this book talks about a young boy who vows to understand the meaning of life (aka attaining enlightenment) and quits social living to become an ascetic. However, after a few years have passed by and he is in his thirties or so, he realizes that he has not attained the goal he had set out to achieve.

During this time he comes to the conclusion that “teachings ” even from the great masters do not appeal to him. He meets the Buddha and tells the “Venerable One” that his teachings have a “gap” or a logical error. He further tells him that even though the Buddha can teach all he wants within the limitations of language, he can never explain how it would have felt when he expreienced Nirvana. Having said such things to the Buddha, Siddharta quits being an ascetic.

He comes back to the “social” world and initially has reservations about having sex (with women!). The desire continues to surge in him and he learns the art of love making from a Courtesan (an euphemism for a high-end prostitute?). This woman (kamala in the book) teaches him all that she can and finally proclaims that he is the best in the business!

Oh! by the way, he is also the best in the town in another form of business, I mean the real one now, selling and buying stuff such as mechandise (!!).

After he has spent a good ten years or so in lustful living he finally obtains the badge of honour – usually given to men who have all the three vices, mukka, chukka and ? (Please replace the question mark for me as I dont know anything appropriately rhyming in gult).

As with some men, after obtaining the badge of honour, he is disgusted with his “lowly” life. How did an ascetic end up like this? Ofcourse, this is not all happy times for Siddharta as he contemplates suicide. Right at that moment he has a “flash” and becomes unconsious.

When he gets up, he is unknowingly more happy. He understands that he always had a false sense of (intellectual) superiority over everyone else and this has led him to take the path of the vices (and the obtain the badge of honour). Ultimately, he is happy at the end of Chapter 8. Still three more chapters to go.

What I could gather from the book till now is that there is no substitute for experiences in life. The goal of eliminating the ego might be reached by deep contemplation but it could more easily be attained by “living” life (by this I mean to live in an emotionally connected way with others) and slowly pushing ego out of the system.

The only great danger might be that, one forgets that the purpose of life is the elimination of ego. Once this is forgotten, the real reason for living has been lost.

PS: BTW, I forgot to mention that Siddharta gets Kamala pregnant, so there might still be some story left 🙂

I like robo!

Robo is the beginning of a new genre in Indian cinema. I don’t really know what to call this genre though. Whatever it is I hope more movies like this are made.

The story itself is “inspired” from many Hollywood movies. I think the core story is based more or less on AI (Speilberg). I don’t really want to analyze similarities with any other movies so I would just blurt out what I really liked about this movie.

Is 214786 a Fibonacci number? Well, one of the researchers ask this question to the Robo and ofcourse he gives the right answer. “What is the largest prime you know” asks another and the Robo types a long list of numbers and ultimately says it is M44, where I thought M stands for the Mersenne Primes (I might be wrong though). Then, there is a paradox regarding infinite sequences of numbers (I could not understand what the exact paradox was) but the Robo answers giving reference to convergent sequences.

That was enough for me to get bowled over. I could not last remember a movie (Hollywood, Bollywood, etc), where such things were mentioned. The director has evidently done a lot of research (may be with help of some math profs).

This is not all the movie has to offer. The core story line explores the  differences between AI and humans. What is it that really differentiates a highly intelligent robot and a Human at the deepest level? I think it is consciousness itself.

It is technically possible to create a robot which cannot be differentiated from a Human. The behavior of the Robot can be reverse engineered from that of the Human. Ofcourse, the only difference would be that the Human really feels the “I” ness and the Robot does not. There is a slight message in the movie as well.

I strongly felt that this is a neat movie, without unnecessary “comedy” side tracks and all that shit. Yet, the movie has its lighter moments. This is the first time I have ever felt to write about a movie and I think this is a great movie. Hope some other indian directors follow suit and explore more scientific subjects instead of making movies with meaningless romances.

P.S. Ofcourse, 214786 is not the number that is mentioned in the movie. Did you really think I would remember the exact number?

Numbers II

I have a recurring need to understand numbers at a deeper level. The current blog is in addition to the one I have written about numbers before with which I have not been entirely satisfied

So, again, what is a number? I want to approach this question from the following hypothetical scenario. Let us say there is a world inhabited by only two intelligent human beings and call them {x} and {y}. Let us also assume {x} and {y} have, until now, developed their language {L} but this language is devoid of any couting terminology.

Now, assume that there is one apple in their world. As {x} and {y} are intelligent beings they come up with a symbol {q} (say) to denote the one apple. Whenever they want to talk about the apple they would write the symbol {q} and immediatley the image of an apple flashes in their brain.

Now, say there are many types of fruits in their world and also that each of these fruits is only one in number. As these beings are intelligent, they come up with a simple method for denoting these :

  • orange {\rightarrow r}
  • pumpkin {\rightarrow s} etc.

So, every time {x} writes {r}, {y} remembers the image of one orange, etc. Similarly, for two apples, two oranges, two pumpkins, let us say they come with symbols, {t,u,v} respectively.

Until now, the inhabitants have been perfectly logical about the understanding of their world. Whatever these beings could do can be replicated by any automatic machine (a computer) as these beings are only following an algorithm to represent any new object with symbols already in their language, {L}.

A significant leap of thinking / imagination is required on the part of {x} and {y} to start denoting one apple, one orange, one pumpkin as {1_a}, {1_o}, {1_p} and two apples, two oranges, two pumpkins as {2_a}, {2_o}, {2_p}, etc. The advantage of this representation is that the total number of symbols used has decreased from {6} (q,r,s,t,u,v) to {5} (1,2,a,o,p). The advantage is more evident when one considers three, four or more of those fruits.

How can {x} and {y} come up with this new representation? The main requirement is that both {x} and {y} have to be capable of abstract thinking. That is, being able to think of “a detached fruit” as representing the number one whatever the fruit is. It is extremely important to note here that the number one is an abstract idea which cannot be expressed without the help of the language, {L}.

Therefore, I think that numbers came into existence only because of the Human being’s ability to think abstractly. A question that pops up now is : Can an automatic machine discover the concept of a number OR is a computer capable of abstract thought?

Logic

I have been reading Topics in Algebra and got really interested when the author says counting is sometimes the most difficult part in mathematics. How does one account for all the possible scenarios in complex situations?

I find one mathematical concept which helps a lot in this regard is that of an equivalence relation. The most striking (yet basic) result of these equivalence relations is that of the disjointedness of the corresponding equivalence classes.

As I read more of the book, I am becoming convinced that I dont have to read it anymore! I dont see why constructing a theory based on a predefined structure of the objects is any good (group theory is built on the notion of a group). Such a theory may have its uses for applications (e.g: Fourier theory) but my tastes are currently towards understanding the limitations of logic itself.

I am done wading through all those endless theorems. I would love to understand that Godel’s incompleteness theorem paper now. Ofcourse, that is an adventure in itself.

Rank of the DFT matrix

I have had to compute the rank of the DFT matrix and even though I did guess that it is full rank, I still had to prove it just to make sure. Note that, as we are dealing with a complex matrix, to make sure that the DFT matrix is full rank, we should show that none of the rows (or columns) is a complex linear combination of the other rows (or columns). Here is a simple proof (if it is correct!):

The DFT matrix (say {A}) is defined as:

\displaystyle  \left( \begin{array}{cccccc}  1 & 1 & 1 & \cdots & 1 & 1\\ 1 & e^{\frac{-2\pi  i}{N}} & e^{\frac{-2\pi i (2)}{N}} & \cdots &  e^{\frac{-2\pi i (N-2)}{N}} & e^{\frac{-2\pi i (N-1)}{N}} \\ 1 &  e^{\frac{-2\pi i(2)}{N}} & e^{\frac{-2\pi i (4)}{N}} & \cdots  & e^{\frac{-2\pi i (N-2)(2)}{N}} & e^{\frac{-2\pi i  (N-1)(2)}{N}} \\ \cdots & \cdots & \cdots & \cdots &  \cdots & \cdots \\ 1 & e^{\frac{-2\pi i(N-1)}{N}} &  e^{\frac{-2\pi i (N-1)(2)}{N}} & \cdots & e^{\frac{-2\pi i  (N-2)(N-1)}{N}} & e^{\frac{-2\pi i (N-1)(N-1)}{N}} \end{array}  \right)

The utility of the DFT matrix is of course to compute the discrete fourier transform. If {f} is a {N} dimensional vector of function values then clearly its DFT is just {\hat{f} = Af}. Now, if we replace all the elements in the DFT matrix by their inverses, we get a new matrix {\hat{A}} which looks like this:

\displaystyle  \left(  \begin{array}{cccccc} 1 & 1 & 1 & \cdots & 1 & 1\\ 1  & e^{\frac{2\pi i}{N}} & e^{\frac{2\pi i (2)}{N}} & \cdots  & e^{\frac{2\pi i (N-2)}{N}} & e^{\frac{2\pi i (N-1)}{N}} \\ 1  & e^{\frac{2\pi i(2)}{N}} & e^{\frac{2\pi i (4)}{N}} &  \cdots & e^{\frac{2\pi i (N-2)(2)}{N}} & e^{\frac{2\pi i  (N-1)(2)}{N}} \\ \cdots & \cdots & \cdots & \cdots &  \cdots & \cdots \\ 1 & e^{\frac{2\pi i(N-1)}{N}} &  e^{\frac{2\pi i (N-1)(2)}{N}} & \cdots & e^{\frac{2\pi i  (N-2)(N-1)}{N}} & e^{\frac{2\pi i (N-1)(N-1)}{N}} \end{array}  \right)

It can be clearly seen that {\frac{1}{N}\hat{A}} is just the matrix associated with the inverse transformation, i.e. {f=\frac{1}{N}\hat{A}\hat{f}}. Hence,

\displaystyle  \begin{array}{rcl}  f =  \frac{1}{N}\hat{A}Af \end{array}

Now, if the rank of {A} is less than {N} then there exist a vector {f^\prime \neq  0} such that {Af^\prime = 0} (note here that {A} is symmetric so a linear combination of its columns is the same as a linear combination of its rows. Anyways this argument is not needed to write the above). This of course means that {f^\prime=0} from {f^\prime =  \frac{1}{N}\hat{A}Af^\prime} which is a mega contradiction.