Dynamic Programming explained (hopefully)

Okay, so some of you certainly already heard about Dynamic Programming (DP), but this is what I understood so far and I am happy to share with you.

Theory

In short, DP is all about ordering your computations in a way that avoids recalculating duplicate work: you have a main problem and a lot of subproblems.

There are two key attributes that a problem must have in order for DP to be applicable: optimal structure and overlapping subproblems:

  • when you have an optimal structure, then the optimal solution of a given problem can be obtained by the combination of optimal solutions of its subproblems
  • when you have overlapping subproblems then a solution of a problem should require the same subproblem again and again

Hey, please note that if a problem can be solved by combining optimal solution of non overlapping subproblems then we are in the “divide and conquer” area, where for example merge sort and quick sort lies.

Dynamic Programming is typically implemented using two common techniques, tabulation and memoization:

  • when you solve a DP problem using tabulation you solve the problem using a bottom-up approach, by solving all subproblems first, and creating a n-dimensional table: based on such table the solution to the original problem is computed. Because of that, tabulation solves all the subproblems.
  • when you solve a DP problem using memoization you do it by maintaining a map of already solved subproblem: you solve the problem¬†top-down, basically solving the top problem first and then recursing in solving the subproblems. Memoization may pay an overhead due to the recursion, but it does not need to solve all the subproblems

Please note that in DP you will often use backtracking, that incrementally builds candidates for the solution and then abandons them when it determines that they cannot contribute to the solution.

Some code, please!

Ok, all good. Now where do we go from here? Some code will helpūüôā A typical DP problem is the fibonacci sequence:

fib(n) = fib(n-1) + fib(n-2)

I guess you can already see the overlapping subproblems and the optimal structure: let’s try to solve this with the most natural solution (I guess), which is a recursion.

    private static int fib(int val) {
        if (val == 0 || val == 1)
            return 1;
        else
            return fib(val - 1) + fib(val - 2);
    }

Ok, cool. It works: result! Pretty inefficient tough,  as it uses a large amount of stack memory and computes the solution to the same problem again and again! In fact, for example, to compute fib(5) it will compute three times fib(2). How can we improve this? Well, memoization comes in handy:

    private static Map<Integer, Integer> cache = new HashMap<Integer, Integer>();

    private static int fib(int val) {
        if (val == 0 || val == 1)
            return 1;
        else {
            Integer res = cache.get(val);
            if (res == null) {
                res = fib(val - 1) + fib(val - 2);
                cache.put(val, res);
            }
            
            return res;
        }        
    }

Ok, this is better. At least we do not recompute a lot of times the same solution, but we still use a lot of stack memory, to handle the recursion. And, at the end of the day, we need to compute all the solution to solve this problem, don’t we? Why don’t we use tabulation then? if we do so, we can revert to a nice iterative solution!

    private static int fib(int val) {
        if (val == 0 || val == 1)
            return 1;
        
        int fibs[] = new int[val+1];
        fibs[0] = 1;
        fibs[1] = 1;
        
        for (int i=2; i<=val; i++)
            fibs[i] = fibs[i-1] + fibs[i-2];
        
        return fibs[val];
    }

Ah, that’s better! No more recursion, a plain iterative process going on, just a bit of memory used for our table. But wait… can we do any better? Do we really need the whole table? Can we do better than Dynamic Programming?

    private static int fib(int val) {
        int prev = 0;
        int curr = 1;
        
        for (int i=2; i<=val; i++) {
            int next = curr + prev;
            prev = curr;
            curr = next;
        }
        
        return curr;
    }

Oh yeahūüôā We just need to keep the last two values, n-1 and n-2: job done!

Conclusions (?)

DP was useful to think out the best algorithm, it was instrumental to discover it but, then, well, we needed that plain old spark of genius that not all of us have (certainly not me!) and some help has been very welcome. But without DP (and without a bigger spark) we would never easily found out an O(n) elegant and efficient solution: so it helps knowing about it. And sometimes some problems are really not solvable without DP, so please do not underestimate it!

Let me know if you are interested in this stuff, I can post moreūüôā

 

 

No comment

“…we recognize that proactive notification of planned maintenance is a much-requested feature, particularly to help prepare in a situation where you have a workload that is running on a single VM and is not configured for high availability. While this type of proactive notification of planned maintenance is not currently provided, we encourage you to provide comments on this topic so we can take the feedback to the product teams.”

http://blogs.msdn.com/b/mast/archive/2013/09/24/windows-azure-virtual-machine-restarted-or-shutdown-with-out-any-notification.aspx

That’s still how it rolls in 2015+

Each comment must start with the words “sorry, …”

I mean, if you really need to write one. When you put a comment, you implicitly admit your inability to communicate your ideas trough your code, you are basically saying “sorry, I do not know enough of this language to express myself decently, so I would put some content in another language to make things clear”.

How dare you? I will have to use your code at a point in time! I will need to go and change it, and I will need to trough your rotten series of comments, I will have to git blame here and there, use aggressively grep, and then spend precious hours of my life that I will never get back in order to understand what you so poorly communicated in your code. How disrespectful of you. At least apologise.

So, new rule for my teams since today.

Java: no timeout on DNS resolution

Breaking news! I just discovered (well. actually yesterday) that it’s not possible to set a timeout on DNS resolution in Java, which relies on the underlying OS. This is NOT good when you have any shape or form of SLA or QOS, you basically potentially throwing it out of the window!

I suggest you do something about it, this is the code I pushed on msnos (see here code and test), basically it uses a Threadpool and a Future to make the magic happen:

[...]
import org.apache.http.conn.DnsResolver;
[...]
public class DnsResolverWithTimeout implements DnsResolver {

    [...]
    @Override
    public InetAddress[] resolve(final String host) throws UnknownHostException {

        Future<InetAddress[]> result = executor.submit(new Callable<InetAddress[]>() {
            @Override
            public InetAddress[] call() throws Exception {
                return systemResolver.resolve(host);
            }
        });

        try {
            return result.get(timeoutInMillis, TimeUnit.MILLISECONDS);
        } catch (InterruptedException e) {
            log.warn("Unexpected interrupton while resolving host "+host, e);
            Thread.currentThread().interrupt();
        } catch (ExecutionException e) {
            log.warn("Unexpected execution exception", e.getCause());
        } catch (TimeoutException e) {
            log.warn("Timeout of {} millis elapsed resolving host {}", timeoutInMillis, host);
        }

        throw new UnknownHostException(host + ": DNS timeout");
    }
}

Of course you can make sure your OS is behaving, but you may not have such luxuryūüôā

Microservices Antipatterns [2]

Looking forward to the upcoming Geecon conference in Prague I am trying to identify new antipattern and I have to say that’s not that difficult, having been working with them here in Workshare for almost three years. The more we use them, the more dead road we findūüôā and it’s a natural darwinian process I guess: regardless of all what you can find now on the internet, sometimes it’s only trough your own experience that you learn. And, as Uncle Bob says, “you learn more from your mistakes than from your successesūüôā. So let’s have a look at those new antipatterns!

Monolith Frontend

You have a single page application or a detached website that speaks to your beautiful microservices, you have to deploy in full for an upgrade, you are using vertical “feature” teams, Congrats, you are living the era of the monolithic frontend!

In fact you carefully structured your teams around different small sets of microservices, each one deployable individually, empowering your teams to deploy quickly and independently but, when you reach the frontend, you go back to the main model of having a big fat application, with merges all over the place and a single deployment. This basically put off all the benefits, team wise, of a microservice architecture.

The solution is of course splitting the app in small independent pieces, deployable independently, or move the entire responsability of the frontend to a separate team. Please note that none of these are as difficult as they seems.

Early mornings

You have a scheduled rithm for deployments (i.e. every week) and you deploy your services in the early morning, tipically around 6am, in order to avoid any disturbance to production across the world. You have a page that lists all the new services to be deployed (created manually) and who is the responsible for them.

The issue that you have is a lack of a critical component called continuous deployment, that would allow you, with a push of a button, to deploy all your software to production. As you do not have fully automated procedure to deploy to production, you are probably lacking other critical parts such as redudancy o high availability of your services.l

The solution is to take your time to have things in orderūüôā Well crafted microservices allows you to deploy very fast and without particular hassle: you have to make sure first that your code supports that and then that your devops are fully onboard in providing the critical parts required, usually in form of automation scripts with any decent framework.

Microservices Antipattern

Last friday I was a speaker at the Geecon Microservices conference in Sopot. I was planning to talk about the whole thing, mentioning of course our work at Workshsare about msnos and I had a long presentation but I had only a 30 minutes slot. Also I went on stage after a shitload of good speeches about such topic, and so I decided to talk about a small niche not fully exploited at the moment, Microservices antipatterns: this is a short recap.

Ah, why I know about this stuff? Because at Workshsare we have a black monolith and when I joined in January 2013 I
immediately started pushing for Microservicesūüôā

Disneyworld

When we are in Disneyworld everything smells nice, cosy and lovely. Also, there are no healthchecks, no monitoring and no metrics, so you assume everything is well in your systems, all the time.

It may seem obvious to have monitoring on any system in production, but some people think that because the services are small and so simple nothing will go wrong. Of course this is silly, because we know that if something can go wrong. it will! Also small does not imply simple, as the complexity in any distributed architecture is order of degrees higher compared to a standard monolithic system.

For that reason you need each service to be able to produce a self healthcheck and metrics, possibly trough an HTTP endpoint, and surround it by some external monitoring. It won’t hurt also adding some smoke testing for each environment, possibly trough test created by you QA people. Implementation wise, dropwizard is a valid option for Java services, and we have our own small opensource implementation for Ruby.

Monolith database

In this scenario many of your microservices are sharing the same database: they expose nice different REST endpoint, but all the data ends up in the same bin.

The issue with this are multiple. First all your services are coupled to the same schema, so are your models (if you have any) and a change in the database may require you to propagate a change in your model. Furthermore, your services won’t be able to be released independently, as a database migration required by service A may/will require a change in services X/Y/Z : one of the big advantages of this architecture is out of the window.

You need each service using its own database, and when necessary they will talk to each other using your APIs. You should design your external API with this constraint in mind, moving on from the big SQL joins and welcoming asynchronous completion of tasks, a user interface that progressively updates itself, APIs that are returning link to other resources rather than embedding them.

Unknown caller

Your microservices are calling each other using their endpoints but there’s no correlation between such calls, so that each call is a new one, completely unrelated to the source of it.

When this happens there’s no easy way to track what caused a failure of a call executed on a service which fails somewhere in the chain. As there’s no (common) identification between such calls you usually end up in a endless marathon of log lurking in order to understand the reason for the failure.

The solution lies in peer collaboration: each server must inject a call id if missing , each server must propagate the call id if present, each server should log each call including the id. This will allow the correlation of the calls together, thus providing a clear chain of invocations between the services

Hardcoded hell

All the address (endpoints) of services are hardcoded somewhere, sometimes directly in the code.

Your microservices directly know about each other, so every time you need to add a new microservice you have to crack open your code or, if you are in a slightly better situation, your configuration or deployment scripts. If you are also experiencing the “All your machines are belong to us” antipattern you may be using some form of DNS or naming trickery at the machine level

You can easily build your own simple discovery mechanism, with a replicated registry which hosts service information, or you can introduce a discovery mechanism like eureka, zookeper, consul, msnos (yeah shameless plug!)

Synchronous world

Every call in your systems is synchronous, and you wait for things to actually happen before returning to the caller: you wait for the database to store the data, another service to perform an operation, and so on.

The issue with this setup is that your service will be very slow, every call can potentially hang forever and with just one malfunctioning service your whole system will be compromised

You need to use 201 and a location header as much as you can during creation operation (or 202 accepted if you fancy), using queues on top of your receivers and implementing asynchronous protocols from the start. It’s more complicated of course but in terms of performance it pays off big time.

Babel tower

You have a lot of microservices and they are using all sort of different lingo to talk to each other (i.e. REST, SOAP, IIOP, RMIP…)

The integration of a new microservice requires a lot of work, the number of adapters increase exponentially, you always consider the costs of integration before deciding to create a new microservice, thus ending up in a small amount of big services

You need of course to standardize your protocols, and introduce boundary objects only were it’s strictly required. A very successful approach consist in use REST as a synchronous protocol (point-to-point) and a lightweight messaging protocol for fully asynchronous communications, a message queue like RabbitMQ or a pub/sub like Redis

All your machines are belong to us

(a note about this one: I consider it an antipattern, but I can be very wrong, so please take it with caution)

A new virtual machine in the cloud is spinoff when a new service is provisioned and your architecture is relying on this for locating or monitoring the services

When this happens you basically have a lot of money to spend, and you are allowed to not care a lot about things like your next AWS bill. Now your architecture is now strictly connected to the money you have: if for any reason you have to shrink down you won’t be able to (timely) do so.

Make sure that your architecture is scalable without relying on metal/virtual scaling only, consider solution based on containers like Docker, CoreOS, Mesos, application servers, plain old barebone deployments. If your services are self-contained you can always run them on any machine that has the correct platform installed

JVM issue: concurrency is affected by changing the date of the system! [part 4]

I am frequently challenged about the seriousness of this bug and the impact that it has. It’s not the first time I try to explain that, because this bug affects LockSupport.parkNanos() it basically spreads like a virus across all the platform, but let’s see this more practically

$ grep -r -l "parkNanos" .
./java/util/concurrent/Future.java
./java/util/concurrent/FutureTask.java
./java/util/concurrent/ThreadPoolExecutor.java
./java/util/concurrent/ScheduledThreadPoolExecutor.java
./java/util/concurrent/RunnableFuture.java
./java/util/concurrent/package-info.java
./java/util/concurrent/ExecutorCompletionService.java
./java/util/concurrent/CancellationException.java
./java/util/concurrent/RunnableScheduledFuture.java
./java/util/concurrent/CompletableFuture.java
./java/util/concurrent/AbstractExecutorService.java
./javax/swing/text/JTextComponent.java
./javax/swing/SwingWorker.java
./sun/swing/text/TextComponentPrintable.java
./sun/swing/SwingUtilities2.java

Well, it does not look that bad, does it? But uhm… I guess we are missing something… who’s using this classes? And who’s using such classes? And who’s using such classes? Omg… I am getting an headhache! Almost EVERYTHING is using this! So trust me, you will be affected. Maybe you are still not believing it, but please remember that this affects also Object:wait(:long) and so, transitively, also synchronized. Wait… WOOT? Oh yeahūüôā So lots of fun! Especially when your system, deployed on client premises, starts doing “strange” things and you are called by your (not very happy) support team.

Be aware that this bug is now fixed in JDK8 and I have no knowledge of any successful backports of it into JDK7.

See also
The full saga, all the articles I published on the matter: