Redundancy is investing not only in more equipment or components initially, but also in an operating cost to keep a facility in optimal state.
by Mr. Mauricio Romero*
In the electromechanical technical field, the term redundancy is defined as "the inclusion of additional components that are not strictly necessary for operation in case of failure in other components".
It is common to identify redundancy levels based on the number of additional required components and are typically identified as follows:
- N: Meets base requirements without redundancy; The failure of some component part of the system will cause an interruption.
- N + 1: an additional equipment-component/module/path more than the basic requirement; Failure in a single unit will not disrupt operations.
- N + 2: two equipment-components / modules / additional routes more than the basic requirement; Even a failure in two units will not interrupt operations.
- 2N: There is a complete duplicate of equipment-components/modules/complete paths for each system; The failure of an entire system will not disrupt operations.
- 2 (N + 1): A duplicate N+1 array; the failure of one complete system still leaves another complete system with redundancy (N + 1).
N, N + 1, N+2, 2N, 2 [N + 1], in English "N" means Need, that is, N+1 translates as Need-plus-One or "One more is needed" and so on.
It is not necessary to be pigeonholed in this nomenclature, it is common to see other redundancy arrangements in equipment, for example, two chillers designed at 75% each, or two chillers at 66% of the design load, etc.; And so there could be many other combinations. So what is the correct level of redundancy?
The degree of redundancy must respond to the level of fault tolerance of a mechanical or electrical system and this is a requirement that must be defined by the end user as part of the initial design criteria in the OPR.
The level of fault tolerance can be defined as the time in which an installation can go out of operation without affecting the main activity or in other words how critical or important it is to maintain and / or sustain the operation of the system in the event of a failure, for example, in Data Centers redundancy is common to analyze it according to the number of hours of annual interruption.
A designer may recommend a level of redundancy, but the final decision should be made by the client based on the critical operation and budget.
Redundancy in electromechanical equipment can range from 2N in a Tier IV Data Center to an N+1 in the pumping system of an ice water plant, in a drinking water pumping system or sewage pumping station in a commercial application.
Most of the time when addressing this topic I have noticed that two scenarios are normally analyzed, scheduled maintenance and unexpected failure of a piece of equipment or component; But I have seldom seen two or more events thought of simultaneously; And usually when talking about failure, only one mechanical failure is analyzed.
For example, if we return to the most basic installation found in an ice water plant, it is normal to find one or two chillers in N array and an N + 1 array in recirculation pumps; with a preventive maintenance program you will have a date to check one of the pumps and with an N+1 fix, this can be planned without any problems. But what happens if during this maintenance a replacement component does not arrive on time and this operating pump has to be removed longer than expected, in this scenario other questions should arise:
* What happens to the operation if we have an unexpected or unscheduled failure in any of the other pumps?
* At what level is the operation of the system compromised?
* Can it be operated or produced in this failure scenario?
* What is the impact on production or the service provided?
* How many hours maximum can be without this component or components in operation?
These and other questions are valid, some may seem extreme, but all are possible in any electromechanical system.
A few years ago, I had the opportunity to visit a Data Center in which transactions of 1 million dollars (USD) per week were given, in this application it is very easy to understand how critical it is to go out of operation due to an unexpected failure or maintenance, and a high level of redundancy is justified without much problem.
It can be determined so far that redundancy is clear when we refer to equipment and components, but there are other issues that become very relevant as well; For example, electrical connections, connections to the building's external electrical network, distribution pipe routes or control system connections.
The problem with this risk analysis is that there is always what is known as a "what if..." or "what if..."; hence the importance that these issues are completely clear in the client's requirements and in the design bases of a project; Because each proposal on the issue of redundancy has an impact on the design and budget.
Everything can happen and above all you can have redundancy, it is just a matter of analyzing the level of fault tolerance that is required according to the application and the point of economic equilibrium within the range of probabilities of an eventuality happening.
We can not leave aside the maintenance and its role in this issue of redundancy, I think at this point we can conclude that a level of redundancy N + 1, N + 2, 2N or 2 N + 1 is useless; if the maintenance schedule is poor.
The paradox of this topic is to arrive at a facility where you see a good design, a good installation, appropriate concepts of redundancy in main equipment, connections or pipes; and finding poor maintenance; In these cases, all the robustness of the design is lost and the investment and engineering do not meet their initial objective.
Conclusions
To the initial title question about what the correct level of redundancy is, it is important to first understand the typical arrangements in the industry (N. N+1, N+2, 2N, etc.).
Personally, I don't think there's just one answer; We can conclude that everything depends on the level of fault tolerance of the application, the risk analysis carried out, the client, the budget, the available physical space and the maintenance program.
The level of redundancy is not something generic, but will depend on each application and the level of fault tolerance of this; It is clear that it is not the same level of fault tolerance for an air conditioning system in the common area of a mall as for the air conditioning system of the surgery block of a hospital or for a data center.
Redundancy is investing not only in more equipment or components initially, but also in an operating cost to maintain an installation in the optimal state so that at the required time there are no effects on the main activity, whether or not they occur.
Commissioning and running testing protocols on a regular basis also keeps the system "on alert" for any eventuality.
Designers are constantly faced with this issue and this is where important work must be done to understand, guide and recommend what level of redundancy may be most suitable for a project.
In this issue of redundancy it is always required to look beyond the box.
References:
* Gonçalves, Joel Pacheco. (December 21, 2015). Redundancy in simple words. www.mdcdatacenters.com/es/company/blog/avoid-risks-and-go-for-2n-power-redundancy
* Duda, Stephen W. 2019. When N+1 is just isn ́t enough. ASHRAE Journal January 2019. Pages 44 to 49.
* Uptime Institute. Data Center Site Infrastructure. TIER Standard: Topology.
* Mauricio Romero Ch.
Director of Mechanical Engineering – Sinergia Ingeniería (Costa Rica)
Health Facilities Design Professional (HFDP), by ASHRAE
Building Commissioning Professional (BCxP), by ASHRAE
Accredited Tier Standards Designer (ATD), by Uptime Institute
LEED Accredited Professional (LEED BD+C), by USGBC