Friday, January 23, 2015

Finding closet pair

Challenge

Assuming multiple points in a 2D dimension, what is the pair of points with closest distance


Naive brute-force solution

Double loop through all points in the scope and find the closest pair will cost O(n^2). However, we can improve and reduce the operational time to O(nlogn) as a smart algorithm demonstrates below.


O(nlogn) for closest pair

note that this is not the details for this algorithm, only the essentials/magics used in it.

Essentials:
  1. Sort points by x-cooridnate and y-cooridnate respectively and store in two arrays => Px, Py. This step will have running time of O(nlogn) with mergesort, which we discussed in one  previous blog, or quick sort.
  2. utilizing "divide and conquer paradigm" and construct recursive function as below:
  3. split points by half => find the closest pair in left half => find the closest pair in right half => find the closest pair with 1 point in left half and 1 point in right half. here the last step, i.e.split_counting, is the difficulty.
  4. By first observing the split_counting step, it has running time of O(n^2), because all points are possible candidates. however, by proof we see that only the subsequent 7 points in Py array can make a closest pair, which is:
    1. for i = 0, i in Py
      1. for j = i + 1, j in [i ... i + 7]
        1. count_cloest_pair(Pi, Pj)
  5. in the nested loop above, it has running time of O(n) only! this is the beauty of this algorithm.

for this algorithm details and the mathematics proof, please google "closest pair algorithm".



why study algorithm - quotation

People who analyze algorithms have double happiness. First of all they experience the sheer beauty of elegant mathematical patterns that surround elegant computational procedures. Then they receive a practical payoff when their theories make it possible to get other jobs done more quickly and more economically.      D. E. Knuth

Friday, January 16, 2015

Mergesort used in inversion counting

Note. In order to understand the inversion counting method in this blog, you need to be familiar with mergesort algorithm first. Especially the "divide and conquer paradigm" concept applied in mergesort.

I have explained that mergesort can be done in O(n*log(n)) in previous blog here

What is inversion

given an array A with length n, the number of inversion in this array equals to number of pairs (A[i], A[j]) with i < j and A[i] > A[j].

example: array = {1,3,5,2,4,6}, the inversions are (5,2) (5,4) (3,2)

Why study inversion

for example: to study the similarity of movie taste for two people.
identify 6 movies and give them to the two to rank according to their favors.
input result from 1st person as array A and result from 2nd person as array B, then calculate the number of inversions to quantify their movie favor differences.

The design of inversion counting 
1.
it uses "divide and conquer paradigm" and algorithm as below:

count_inversion (array A)
    count_inversion (left half of A i,j < n/2)
    count_inversion (right half of A i,j > n/2)
    count_split_inversion (i < n/2 < j)

at each level of count method, the operation carried out is "count_split_inversion". The key challenge is to make sure "count_split_inversion" method can be done in O(n) so that inversion counting can be done with O(nlog(n)).

2,  (brilliant idea!)
Piggyback on mergesort!

sort_and_count_inversion (array A)
    sort_and_count_inversion (left half of A i,j < n/2)
    sort_and_count_inversion (right half of A i,j > n/2)
    merge_and_count_split_inversion (i < n/2 < j)

The reason to add mergesort, which has O(nlog(n)) is that it will reduce the inversion count complexity from O(n^2) to O(nlog(n)), and we will see as below:

when do merge_and_count_split_inversion from "sorted left half A" and "sorted right half A", items are copied from them to an empty array (same as in mergesort).  Now we can claim that: whenever we compare A[i] and A[j] if we found A[i] > A[j] (i<n/2<j) in this step , an inversion is found. this step is O(n).

Therefore, the computational complexity at each level is O(n) and the total complexity is O(nlog(n)).

C# Implementation
 using System;  
 using System.Linq;  
 namespace InversionCount  
 {  
   class Program  
   {  
     static void Main(string[] args)  
     {  
       Console.WriteLine("please give a integer for the size of array, system will generate internal values and do sorting for you");  
       int size = int.Parse(Console.ReadLine());  
       int[] count = new int[size];  
       RandomizeArray(ref count);  
       SortAndCountInversion(count);  
       Console.ReadLine();  
     }  
     public static int[] SortAndCountInversion(int[] A)  
     {  
       if (A.Length == 1)  
         return A;  
       int size = A.Length;  
       int halfSize = A.Length / 2;  
       int[] A1 = new int[halfSize];  
       int[] A2 = new int[A.Length - halfSize];  
       Split<int>(A, halfSize, out A1, out A2);  
       int[] ResultA = SortAndCountInversion(A1);  
       int[] ResultB = SortAndCountInversion(A2);  
       return MergeAndCoutInversion(ResultA, ResultB);  
     }  
     private static int[] MergeAndCoutInversion(int[] ResultA, int[] ResultB)  
     {  
       int resultSize = ResultA.Length + ResultB.Length;  
       int[] resultArray = new int[resultSize];  
       int i = 0, j = 0;  
       for (int k = 0; k < resultSize; k++)  
       {  
         if (i == ResultA.Length)  
           resultArray[k] = ResultB[j++];  
         else if (j == ResultB.Length)  
           resultArray[k] = ResultA[i++];  
         else if (ResultA[i] <= ResultB[j])  
           resultArray[k] = ResultA[i++];  
         else if (ResultA[i] > ResultB[j])  
         {  
           for (int s = i; s < ResultA.Length; s++)  
           {  
             Console.WriteLine("{0} - {1}", ResultA[s], ResultB[j]);  
           }  
           resultArray[k] = ResultB[j++];            
         }  
         else  
         {  
           Console.WriteLine("odd!");  
         }  
       }  
       return resultArray;  
     }  
     public static void Split<T>(T[] array, int index, out T[] first, out T[] second)  
     {  
       first = array.Take(index).ToArray();  
       second = array.Skip(index).ToArray();  
     }  
     private static void RandomizeArray(ref int[] count)  
     {  
       int size = count.Length;  
       Random random = new Random();  
       int randomInt;  
       for (int i = 0; i < size; i++)  
       {  
         randomInt = random.Next(0, size);  
         count[i] = randomInt;  
       }  
     }  
   }  
 }  







Thursday, January 15, 2015

MergeSort study

MergeSort algorithm learning notes:

I am studying algorithm course in coursera to sharpen my CS understanding.

Karatsuba algorithm (KA) is the first algorithm in this course opened my eyes. I never thought of any faster algorithm can do better/faster than the grade 3 learned calculating algorithm. I plan to implement KA in another thread in C# and post it here.

Back to track, this thread is about MergeSort algorithm (MA). Teacher Tim Roughgarden introduced the MA with a simple example and illustrate further step by step why MA is faster than most of other O(n2) algorithm. 

Here is MA's sorting complexity explanation:

  • Each level of MA calculation has maxi. 6j operations, given that j is the number of items in the current level array input in each recursive call). 
  • MA has sum of log2(n) + 1 levels (assume original level is level 0)
  • At any given level j, there are 2^j (2pwoer j) number of recursive calls
  • At any given level j, each recursive call  has array size n/2^j
We find out that the operations at any given level j is 2^j * (6 *n/2^j) = 6n (astonishingly independent of j ! isn't this cool!?)

Finally, 6n (log2(n) + 1) is the total operations in MA.

In summary, the reason Mergesort is fast is that its operations in each call is constant 6n and the level of recursive call is log2(n), which is flatter than f(n) = n. The drawback for mergesort is that it requires extra space to store the items when aggregating back from the left and right sub-streams.


C# implementation
 using System;  
 using System.Linq;  
 namespace Mergesort  
 {  
   class Program  
   {  
     static void Main(string[] args)  
     {  
       Console.WriteLine("please give a integer for the size of array, system will generate internal values and do sorting for you");  
       int size = int.Parse(Console.ReadLine());  
       Random random = new Random();  
       int randomInt;  
       int[] count = new int[size];  
       for (int i = 0; i < size; i++)   
       {  
         randomInt = random.Next(0, size);  
         count[i] = randomInt;  
       }  
       int[] countResult = Mergesort(count);  
       foreach (int i in countResult)  
       { Console.WriteLine(i); }  
       Console.ReadLine();  
     }  
     public static int[] Mergesort(int[] A)  
     {  
       if (A.Length == 1)  
         return A;  
       int size = A.Length;  
       int halfSize = A.Length / 2;  
       int[] A1 = new int[halfSize];  
       int[] A2 = new int[A.Length - halfSize];  
       Split<int>(A, halfSize, out A1, out A2);  
       int[] ResultA = Mergesort(A1);  
       int[] ResultB = Mergesort(A2);  
       return MergeTwoArray(ResultA, ResultB);  
     }  
     private static int[] MergeTwoArray(int[] ResultA, int[] ResultB)  
     {  
       int resultSize = ResultA.Length + ResultB.Length;  
       int[] resultArray = new int[resultSize];  
       int i = 0, j = 0;  
       for (int k = 0; k < resultSize; k++)  
       {  
         if (i == ResultA.Length)  
           resultArray[k] = ResultB[j++];  
         else if (j == ResultB.Length)  
           resultArray[k] = ResultA[i++];  
         else if (ResultA[i] <= ResultB[j])  
           resultArray[k] = ResultA[i++];  
         else  
           resultArray[k] = ResultB[j++];  
       }  
       return resultArray;  
     }  
     public static void Split<T>(T[] array, int index, out T[] first, out T[] second)  
     {  
       first = array.Take(index).ToArray();  
       second = array.Skip(index).ToArray();  
     }  
   }  
 }