How to make Google Search irrelevant

This is a concept I thought about an year ago, and it never went past the inception phase. I still think it’s a good one and for that reason I want to share it so that it won’t get lost in time:)

Vision: “Replace Google search”

The final vision for this project is to replace Google search. At the moment there’s no way to create a better ranking algorithm than the one used by Google, so the only way to beat Google search is to make it irrelevant, like Netflix did to Blockbuster or like digital music did to CDs.

The aim of this project is to collect all the human knowledge in a single and shared database, with all the user contributing to it as at the same time they will contribute to augment their personal knowledge.

Why this will work

As a person I am always frustrated about the time I spend to find something on the internet, or even worse, when I am offline. Searching the internet might be fast, but even when the experience is okay, I have the tendency to forget things, and I repeat search I did in the past sometimes very frequently. What I would like to do is to maintain this knowledge somewhere, in a place where it’s always accessible and extremely easy to find. At the moment there’s no such tool around: Evernote or Pocket, two tools often cited regarding this idea, have a very less bold metaphor, it does not focus around social interaction, and has no ranking of his contents as it’s focused on a single person.

Strategy (overview)

The implementation strategy will happen in three phases:

Phase one:
Launch of an application suite to collect a person knowledge

    • always available, on every device, must work offline
    • a simple and clear metaphor to manage knowledge
    • a set of super simple mechanisms to import knowledge from existing sources (i.e. Quora, Wikipedia, IMDB, Stackoverflow, emails…)

Phase two
Introduce social capabilities

    • a social mechanism to share or include knowledge of other people
    • a ranking algorithm to qualify better content from better users

Phase three
Launch of a worldwide site to explore knowledge of mankind, replacing effectively all existing focused and unfocused sources

    • all existing human knowledge will be available, already catalogued and sorted by human beings, voluntarily
    • the ranking algorithm will allow the relevant and better contents to emerge spontaneously and naturally

Strategy (detailed)

A more detailed explanation of the three-phases strategy follows.

Phase one: application suite

An application that must be always available regardless of the fact that I am using a mobile phone or a desktop computer, or even a Kindle. I should be able to install an application / plugin that will allow me to push data into my knowledge base without hassle: a set of specific browser plugins are highly recommended. Also, it must work offline: my entire database (or the most relevant part of it at least) shall be always accessible, and I will be able to push new data in the database at any time, as it will be synced automatically as long as I am online again.

A simple and clear metaphor to manage knowledge has to be found, it has to support both the adding and the retrieval of information to/from the database. At the moment the most promising model is based on a graph of information, with tags associated to it, and maybe different clusters, but it will be very important to find an extremely simple,effective and attractive mechanism for the end user to store and retrieve his knowledge (if we had telepathy we should use that). The metaphor must support some form of classification of contents, such as pre-defined tagging, clerical, prototyping.

Because we are consolidating knowledge we need to provide, and specifically on the internet, a set of super simple tools to import knowledge from existing sources like Quora, Wikipedia, IMDB, Stackoverflow, and even personal emails. Ideally we should think about two different mechanics to collect an information: you can copy it, so that it’s merged into your database and you can change as much as you want, or link it. so that you can still see the whole information (as it’s constantly sync-ed from the remote side) but it’s in readonly mode. An advance merge mode can be thought for copied contents, as soon as it’s extremely simple.

Phase two: social capabilities

A strong social element must be added to the platform from the very start. The basic mechanism would allow me to declare some content “public” or “friendly” (to a set of friends or a circle), so that other people can pull my content in their database (also here, copy+merge or link). An integration with either Facebook or Google+ is mandatory, more integrations are highly advisable.

An important step that enables the transition to the phase 3 of the project is a very good ranking algorithm, so that we can qualify the better content and the better users, ideally the “experts in the fields”: for that reason the metaphor, as explained before, must enable the classification of the contents. Such ranking algorithm should be related to relevance and in general to the reputation of the users, the same way Quora or Stackoverflow for example rank their user and automatically decide which contents should be in principle more relevant to a question: how many user linked such content? or copied it? Or liked it? Explicit rating should also be allowed, but in general the more automation the better. Some sort of gaming ranking is of course necessary.

Phase three: the worldwide site

The final aim of this project is to create a “socialpedia” or a “knowledgepedia”, so a form of global knowledge encyclopedia managed by all the users. The main difference between the most obvious antagonist, Wikipedia, is the way contents will emerge: we will not be collecting and classifying information, the users will do it, as they want to organize their knowledge. In the process of doing that, they will do organize a global knowledge that can then be used to the purpose of rendering any other search useless or redundant. The ranking algorithm will guarantee us the the best content will surface in such database, and it will be invaluable because of the fact that actual human beings, not machines or algorithms, will classify it.

At that time, you will hold the world knowledge. Like Google now, but better


My teams are awesome!

I just want to drop a few lines to say that my teams are awesome! I am so proud to be here in Workshare and manage these people! No, we are not super-brilliant-smart or whatever, we definitely not googlers and we have a long road to improve, but we will get there, eventually:) So far we are doing just right!

And yeah, you have to realize that if your team suck then, well, you probably suck as well. Of course I failed to realize this myself first in my early days as a manager.

Dynamic Programming explained (hopefully)

Okay, so some of you certainly already heard about Dynamic Programming (DP), but this is what I understood so far and I am happy to share with you.


In short, DP is all about ordering your computations in a way that avoids recalculating duplicate work: you have a main problem and a lot of subproblems.

There are two key attributes that a problem must have in order for DP to be applicable: optimal structure and overlapping subproblems:

  • when you have an optimal structure, then the optimal solution of a given problem can be obtained by the combination of optimal solutions of its subproblems
  • when you have overlapping subproblems then a solution of a problem should require the same subproblem again and again

Hey, please note that if a problem can be solved by combining optimal solution of non overlapping subproblems then we are in the “divide and conquer” area, where for example merge sort and quick sort lies.

Dynamic Programming is typically implemented using two common techniques, tabulation and memoization:

  • when you solve a DP problem using tabulation you solve the problem using a bottom-up approach, by solving all subproblems first, and creating a n-dimensional table: based on such table the solution to the original problem is computed. Because of that, tabulation solves all the subproblems.
  • when you solve a DP problem using memoization you do it by maintaining a map of already solved subproblem: you solve the problem top-down, basically solving the top problem first and then recursing in solving the subproblems. Memoization may pay an overhead due to the recursion, but it does not need to solve all the subproblems

Please note that in DP you will often use backtracking, that incrementally builds candidates for the solution and then abandons them when it determines that they cannot contribute to the solution.

Some code, please!

Ok, all good. Now where do we go from here? Some code will help:) A typical DP problem is the fibonacci sequence:

fib(n) = fib(n-1) + fib(n-2)

I guess you can already see the overlapping subproblems and the optimal structure: let’s try to solve this with the most natural solution (I guess), which is a recursion.

    private static int fib(int val) {
        if (val == 0 || val == 1)
            return 1;
            return fib(val - 1) + fib(val - 2);

Ok, cool. It works: result! Pretty inefficient tough,  as it uses a large amount of stack memory and computes the solution to the same problem again and again! In fact, for example, to compute fib(5) it will compute three times fib(2). How can we improve this? Well, memoization comes in handy:

    private static Map<Integer, Integer> cache = new HashMap<Integer, Integer>();

    private static int fib(int val) {
        if (val == 0 || val == 1)
            return 1;
        else {
            Integer res = cache.get(val);
            if (res == null) {
                res = fib(val - 1) + fib(val - 2);
                cache.put(val, res);
            return res;

Ok, this is better. At least we do not recompute a lot of times the same solution, but we still use a lot of stack memory, to handle the recursion. And, at the end of the day, we need to compute all the solution to solve this problem, don’t we? Why don’t we use tabulation then? if we do so, we can revert to a nice iterative solution!

    private static int fib(int val) {
        if (val == 0 || val == 1)
            return 1;
        int fibs[] = new int[val+1];
        fibs[0] = 1;
        fibs[1] = 1;
        for (int i=2; i<=val; i++)
            fibs[i] = fibs[i-1] + fibs[i-2];
        return fibs[val];

Ah, that’s better! No more recursion, a plain iterative process going on, just a bit of memory used for our table. But wait… can we do any better? Do we really need the whole table? Can we do better than Dynamic Programming?

    private static int fib(int val) {
        int prev = 0;
        int curr = 1;
        for (int i=2; i<=val; i++) {
            int next = curr + prev;
            prev = curr;
            curr = next;
        return curr;

Oh yeah:) We just need to keep the last two values, n-1 and n-2: job done!

Conclusions (?)

DP was useful to think out the best algorithm, it was instrumental to discover it but, then, well, we needed that plain old spark of genius that not all of us have (certainly not me!) and some help has been very welcome. But without DP (and without a bigger spark) we would never easily found out an O(n) elegant and efficient solution: so it helps knowing about it. And sometimes some problems are really not solvable without DP, so please do not underestimate it!

Let me know if you are interested in this stuff, I can post more:)



No comment

“…we recognize that proactive notification of planned maintenance is a much-requested feature, particularly to help prepare in a situation where you have a workload that is running on a single VM and is not configured for high availability. While this type of proactive notification of planned maintenance is not currently provided, we encourage you to provide comments on this topic so we can take the feedback to the product teams.”

That’s still how it rolls in 2015+

Each comment must start with the words “sorry, …”

I mean, if you really need to write one. When you put a comment, you implicitly admit your inability to communicate your ideas trough your code, you are basically saying “sorry, I do not know enough of this language to express myself decently, so I would put some content in another language to make things clear”.

How dare you? I will have to use your code at a point in time! I will need to go and change it, and I will need to trough your rotten series of comments, I will have to git blame here and there, use aggressively grep, and then spend precious hours of my life that I will never get back in order to understand what you so poorly communicated in your code. How disrespectful of you. At least apologise.

So, new rule for my teams since today.

Java: no timeout on DNS resolution

Breaking news! I just discovered (well. actually yesterday) that it’s not possible to set a timeout on DNS resolution in Java, which relies on the underlying OS. This is NOT good when you have any shape or form of SLA or QOS, you basically potentially throwing it out of the window!

I suggest you do something about it, this is the code I pushed on msnos (see here code and test), basically it uses a Threadpool and a Future to make the magic happen:

import org.apache.http.conn.DnsResolver;
public class DnsResolverWithTimeout implements DnsResolver {

    public InetAddress[] resolve(final String host) throws UnknownHostException {

        Future<InetAddress[]> result = executor.submit(new Callable<InetAddress[]>() {
            public InetAddress[] call() throws Exception {
                return systemResolver.resolve(host);

        try {
            return result.get(timeoutInMillis, TimeUnit.MILLISECONDS);
        } catch (InterruptedException e) {
            log.warn("Unexpected interrupton while resolving host "+host, e);
        } catch (ExecutionException e) {
            log.warn("Unexpected execution exception", e.getCause());
        } catch (TimeoutException e) {
            log.warn("Timeout of {} millis elapsed resolving host {}", timeoutInMillis, host);

        throw new UnknownHostException(host + ": DNS timeout");

Of course you can make sure your OS is behaving, but you may not have such luxury:)


Get every new post delivered to your Inbox.

Join 298 other followers