Content for the HPC community and innovation enthusiasts: tutorials, news and press releases for users, engineers and administrators
- All News
- Aerospace & Defence
- Artificial Intelligence
- Cloud Platform
- E4 Various
- European Projects
- Kubernetes Cluster
Managed Services: challenges and advantages
What are Managed Services and how are they used in a complex, high-performance infrastructure?
Monday morning, at 9 o’clock, a cup of coffee in your hand, you go to the desk, ready to start a new week. Say hello to your colleagues and take a look at the datacenter. As soon as you see it, you notice a mysterious flashing red led that lights up the room…
It will certainly have happened to many of you, with the consequent log analysis activities, calls to technical support, tests and trials. In many cases this is normal, but not the only way.
For years, many IT companies have qualified as Managed Services Providers. These companies take charge, manage, and control services, based on the agreed level, in different ways.
…But let’s go back to the red led: after seeing it, you are now at the desk, turn on the computer and notice an email from the technical support that warns you about the problem. It also informs you that the component will be shipped the same day and ask you to please contact the technician to schedule an appointment for a replacement.
This is the first example of service, the so-called “Break/Fix”: the system, in addition to giving a visual report of the problem, also sent the report to technical support with all the necessary logs; the team has verified the problem and identified the solution in complete autonomy.
But we can do better…
The error report is usually preceded by predictive errors on the component itself; by intercepting these errors and applying Artificial Intelligence algorithms, it is possible to intervene for the replacement before the problem arises. In these cases, we are talking about “Predictive failure analysis”.
…The led is flashing on a Raid disk which is now in a critical state. “We hope the replacement arrives in time”!
(Surely you are thinking just that).
Working in a predictive way, it means we act on a system that is in an optimal state, minimizing the risks of more serious problems.
But we can do even better…
In the world of high-performance systems, the hardware is a fraction of the complexity of a company’s IT infrastructure; in addition to the mere replacement of a component, the real need is the operation of the service that uses this component. The effectiveness of the managed service occurs only in the presence of support capable of restoring the full functionality of the infrastructure. Only now we are really talking about “Managed Services”.
…The Raid is part of a high-performance parallel storage system; its service must keep working despite all. Changing the disk is not enough, you also need to act on the management software…
The technical staff who intervene to solve the problem must have vertical competence on the solution, starting from the basic hardware up to the software that runs on it.
But can we do even better?
An issue was fixed before it occurred. It was guaranteed that the services connected to it were preserved… But what else could be done?
The use of the system could be analyzed, to identify critical issues and improve the conditions of use.
In short, solve the problems that could arise through the “Remote Monitored Managed Services”.
Repeatedly and often, the sizing and the features of the IT infrastructure are stable on by default workflows, based on user feedback or in any case on unmeasured data; the telemetry analysis of real usage data allows to highlight the criticalities of the infrastructure.
…The storage was about to fail due to the undersized required performance; under the current conditions, the problem will occur again…
As efficient as a support service can be, a linear increase in problems caused by the combination of time/over-use can only lead to production blocks. Solving these situations, as well as improving performance, guarantees a lower risk for the future.
…Now you just have to finish drinking your invigorating coffee! 🙂
But… What are the challenges?
The more we rely on external providers for the complete management of our infrastructure, the more we lose control. Losing control, you risk having to be submitted to any decision of your provider, in terms of pricing and terms of the contract.
If the provider closes or withdraws support on the solution, without adequate skills inside the company (not necessary since the management is in the hands of third parties) there is a risk of having to change the infrastructure as well as supplier.
Relying on companies with years of experience, and whose offer is mainly based on open-source oriented solutions, is certainly a good starting point!