Saturday, 22 September 2018

Changing The Mindset To Design A Cloud Application


Following are some factors to consider while designing a cloud application:

Clients previously connected to applications through the enterprise’s local network. So, it was generally pretty small, may be 100,000 employees and some partners accessing the system but now we are talking about millions or 100s of million people accessing the application over internet. The public internet will introduce increased latency, reduced bandwidth and is less reliable. So the scale is wildly different now than it was when we were building services in the past.
Demand and Load on the application was relatively small, stable and fairly predictable. But now the the load depends on adaptability and it’s hard to predict demand and load in the beginning. So that also requires that we design things for scale kind of mindset from the beginning. So that it can scale well as the load increases. And of course it can scale down when the demand decreases to save the cost.
Data Centers generally served a single tenant or enterprise, and are now serving multiple tenants. The multi tenant environment comes with its own set of challenges, such as the “noisy neighbour problem.” The application may be sharing resources, like network or physical disks. Although they are isolated, load from one may potentially impact the other.
Operations managing or the care and feeding of the services can be rather expensive. In the past, we would use people for this and then you had to deal with the things like upgrading the operating system, hardware, networking infrastructure and so on. In this day and age, we’re trying to get more into an automation point of view, where these things are just happening automatically, which makes it much cheaper to run the service.
Scale was handled by increasing the resources in very expensive and specialized hardware (scale up). In the cloud, this is achieved by distributing the load across a larger number of commodity machines with cheap hardware(scale out) and we just assume that failure is going to happen.
Failure When we are building services in the past, we never really consider that the hardware would fail. We might take backups once in a week by assuming that database engine is going to be fine but, In this new world, because we’re on these cheaper, more commodity PCs, it’s more likely that they are going to fail. And so we have to embrace failure when we are designing or architecting these distributed cloud applications, we assume that failure is always possible. Is that in the past If your machine that ran your database server failed, that was usually catastrophic. It meant that none of your clients could go and access your service, they couldn’t look up anything.
Customers wouldn’t be able to make purchases at your service. And so now, it was costing real money to your business. In this new world, the cloud world, we’re thinking that machine loss is an expected thing. It’s actually normal, and even comment for that to happen, especially if you’re running its scale, it’s much more likely that some of the hardware is going to fail on you.
And so if we design and take this into account as we are architecting, then the failure is really no big deal. And we will design our services to be resilient against this and keep running even in the case of failure.
Now there may be some loss of scale, there may be some loss of data, but for the most part everything keeps running. And the more resilient you wanna be against this, the harder it is for you to architect these kinds of applications and develop them and the more costly it tends to be to run. So it’s largely going to be a business decision, first and foremost as to how much to invest. And it may be, that for your company, the service going down for an hour is not that big a deal. And it could be that for company, the service going down for an hour is loss of millions of dollars.

A different approach is required when architecting cost effective and performant applications for the cloud that are resilient to failures.

Exception Handling: In the past, a lot of times people would write their code to catch all exceptions, because if you don’t catch an exception, it becomes unhandled, your service would go down and crash. And that means it was no longer responding to client request that were coming into it. But this is really not the best thing to do in terms of managing state. If an application gets an unhandled exception it usually means that something unexpected happened and the state of the data in the application is potentially in an unpredictable state right now. And continuing to have the service run under these circumstances might mean more data corruption and unpredictable results. So really the best thing to do for an unhandled exception is to have your application terminate, have it destroy any potentially corrupted state in memory. And then restart the application so then it is in a well known
good state and then pick up from there. But in the past, a lot of programmers haven’t wanted to do that, because when the application crashed it stops taking requests from clients for a period of time. However in this new world in the present, now we can really do what is the right thing, which is allow the application to crash and to restart. And the reason why this is okay is because you’re now gonna be running multiple instances of the same service on different machines. We’re embracing this concept of failure, and so if one of the applications on one of the machines crashes, then the client request can still be handled by the other instances on the other machines. And so the clients are still being able to interact with your service successfully.
Communication: When everything is working perfectly, when a client goes and sends a network request, communicates with a server via a network request. Then those messages are sent in order and you can pretty much expect that the messages are gonna be sent exactly one time to the server.
But in this new world where again, we are embracing failure, it’s possible that a client sends a message to a server, the server starts to process the request but then the server crashes. Especially because of the conversation we just
had about exception handling. So that means in order to be resilient to this failure, the client must retry the operation against the server. So, we have to design our applications, our client applications, to expect failure from the server, and to automatically retry communication request. It’s also possible that the server might get the first request from a client, start to process it successfully and then maybe the client receives a timeout for some reason.
And the client may send the request again cuz it’s going to do a retry. This means that the server could get the request multiple times. But, we don’t want the server to process the request multiple times. So, the server has to be designed to handle these requests in an item potent fashion. Meaning, that there’s no ill effect for performing the same operation multiple times. And this also means that the messaging may occur out of order. So we have to be resilient about that when we are designing our applications.

No comments:

Post a Comment

Followers