27/03/2017

[Java] Check whether a string is palindrome

Here is a simple method to test if a string is palindrome. In my view punctuation should be ignored as well as case.

The idea is simply to walk the string from the borders towards the middle at the same time, skipping characters that are not alphanumeric until we either reach the middle or find two characters which are not same. Also, a single character is palindrome.

You can check my Gist for the code with tests:

 public static boolean isPalindrome(String s){  
     if(s == null || s.length() < 1)return false;  
   
     for(int i = 0, j = s.length() - 1; j > i; ){  
       if(!Character.isLetterOrDigit(s.charAt(i))){  
         i++;  
         continue;  
       }  
       if(!Character.isLetterOrDigit(s.charAt(j))){  
         j--;  
         continue;  
       }  
       if(Character.toLowerCase(s.charAt(i)) != Character.toLowerCase(s.charAt(j)))return false;  
       else{  
         i++;  
         j--;  
       }  
     }  
     return true;  
   }  

[Java] Trie data structure for text input prediction

A Trie is a very useful data structure with a confusing name, which is luckily easier to use than to pronounce.

I have seen it mainly used in input text suggestion or String analysis. The structure is a Tree where each node can have a label and multiple children; nodes with a label represent a full piece of data (eg a word) while nodes without label are used only for navigation or substring evaluation for example.

The list of children can be implemented with different data structures, I propose a version using a HashMap for faster searching: given an input String s, starting from the first character keep calling get(char) until we reach a word to suggest. Each get is O(1) so we are only paying O(M) (worst case is we're going to predict the longest string present).

Using a List would require us to scan the list until we find the entry for the next character which I think due to big-O magic can be assumed to still be O(1) since the list is at most as big as the alphabet we are using and therefore we can call it constant size, but still sounds like an unnecessary operation to me.

LinkedHashMap would not help much unless we associate some meaning with the insertion order eg: most often used words first.

TreeMap would slow the process down but guarantee to retrieve words in the desired order (eg alphabetical).

26/03/2017

[Java] 0/1 Knapsack problem O(n log n) heuristic

The 0/1 knapsack problem is a fairly common problem in daily life and it's unfortunately one of those problems where you must accept that the optimal solution cannot be computed as quickly as you would like, no matter how hard you think about the problem. As a friend put it, "if the solution looks like black magic, there is probably something wrong with it".

The problem description states: given a recipient with a fixed capacity and a set of items, each with a specific weight and value, the goal is to find the subset that maximises the total value of the chosen items while not exceeding the maximum recipient capacity.
In the 0/1 version, you cannot break an object (not possible to only take a fraction of it) so you can only either take it or not.

Sometimes calculating the optimal solution is too expensive or simply unfeasible (eg online knapsack) so a different, possibly suboptimal approach has to be taken. The one described here uses a greedy algorithm and aims at maximising the total value by selecting items after ordering them by desirability in terms of value per space left after they are picked.

20/03/2017

[SQL] Excel COUNTIFS - count columns matching criteria in a row

Excel in his arsenal of useful functions has COUNTIFS, basically a count of how many elements in a one dimensional range match a specific criteria. It says multidimensional, but it's not, it's either the same criteria twice for more dimensions or a different criteria. Key point: one list at a time.

However this is a very basic need which is not immediately achievable in SQL as well since we cannot loop over columns in a row. That is, unless we remember that PIVOTing is actually a thing. In this specific case we use the inverse operation, UNPIVOT.

[Java] Mergesort sorting algorithm for int arrays

Mergesort is well known as one of the best sorting algorithms out there with a O(n log n) worst case runtime; it doesn't get much better than this without strange big-O tricks.

But while understanding how it works is easy, implementing it is a different story given that it combines two nested function calls in the recursion and requires careful array index management. Also Java pass by value adds a small gotcha for good measure.

You can find my implementation for int[] arrays (as in: not Integer, not ArrayList, not List, not whatever else. Pure and simple int primitive data type with old fashioned array, the kind that does not change size once created if you can still remember those) on my Gist along with some test cases.

12/03/2017

[Python] BloggerBackup script to backup Blogger posts

This script allows users to quickly download posts from their Blogger blog in order to keep a backup copy. On Windows there is the amazing BloggerBackupUtility but sadly that's not available for Linux as well, hence this small project comes to life.

You can get the utility here or check out the source code here. To run it, you must have Python 3 installed.

Usage is:

 bloggerbackup --api-key <API key> --blog <blog URL> --backup-dir <backup directory> [--start-date <date>] [--end-date <date>] [--verbose]  

options:

   --help - prints this help  
   
   --api-key <api key> - mandatory API key used to issue queries  
   
   --blog <blog URL> - mandatory URL of the blog to operate on  
   
   --backup-dir <backup directory> - mandatory directory where to put the posts backup  
   
   --start-date <date> - optional, date from which to begin fetching posts in RFC 3339 format: yyyy-MM-ddTHH:mm:ss+HH:mm  
   
   --end-date <date> - optional, date where to stop fetching posts in RFC 3339 format: yyyy-MM-ddTHH:mm:ss+HH:mm  
   
   --verbose - optional, prints debug information while processing  

The script is extremely barebone and definitely improvable. Also, it requires you to setup an own API key in order to issue queries in the free tier. To do so, visit the Google credentials page.

06/03/2017

[Java] BST element distance - parent pointer

Well, that's a bit anticlimactic. After playing with the other approaches, the one that initially gave me the most pain was also quite fast to implement as soon as the AHA! moment came.

However spoiler alert: the results are quite disappointing.