19 Jun 2020

Significance of Failure Mode Effect Analysis

Need of FMEA

According to the automotive functional safety standard ISO 26262, Failure Mode Effect Analysis (FMEA) is one of the safety measures to avoid or control systematic failures and to detect or control random hardware failures, or mitigate their harmful effects.

The most important aim of any process as per functional safety is to make sure none of the safety goals are violated. The derivation of these goals is based on the analysis of hazard that failure of the item in scope can cause and assessment of risk related to those failures.

As the definition of the item according to ISO 26262 is performed at a very initial phase of product development, to analyze all the failures and associated risks, the best practice followed is a top-down approach like Hazard Analysis and Risk Assessment (HARA). At this level, we assess the risk based on recognized failures. However, Analysis can vary when components are developed in further phases of development. With the lowest components of a system developed, it is very easy to recognize all the failure modes for any failure related to that component. To assess risk with higher coverage it is very important to know how an element or an item fails to provide the intended behavior. Hence, FMEA analyses the effects of failure modes.

Definition of FMEA

FMEA is a quantitative analysis that uses a bottom-up approach. It can be performed on a design or a process. It identifies the weakness of a design or process. It starts at a low level (component level) of a product or a process and makes its way up to the effect of the failure of the system or sub-system. It results in highlighting the critical features of a system or sub-system. The main purpose of FMEA is to identify potential problems in the early design process of a system or product that can affect its safety and performance and to introduce countermeasures to mitigate or minimize the effects of the identified potential problems (failure modes). Following aspects shall be quantified while performing FMEA:

Low-Level Component
Effect of failure
Cause of failure
Risk Mitigation

 

The following shall be the parameters for quantifying above factors:

FMEA Flow:

FMEA at different stages of product development

FMEA shall be performed at following levels of product development to ensure reliability:

System-level – System FMEA or SFMEA

Hardware-level – Hardware DFMEA

Software level – software FMEA

At each above-mentioned level, the definition of component, function, and sub-function varies. However, the FMEA flow described above remains the same for all.

Structure of Product/ Design FMEA

Steps for Product/Design FMEA


RPN Calculation

 

RPN calculation is based on three factors.

  • Severity [S]: Severity rating quantifies the severity of the effect.

The value of this rating ranges between 1 (low) to 10 (high).

 

  • Occurrence [O]: Occurrence rating quantifies the probability of occurrence of a cause that can lead to a failure event.

The value of this rating ranges between 1 (low) to 10 (high).

 

  • Detection [D]: Detection rating quantifies the probability of detection of a failure by the current detection mechanism.

The value of this rating ranges between 1 (low) to 10 (high).

 

RPN is considered to be the product of Severity, Occurrence, and Detection.

RPN = [S] x [O] x [D]

 

FMEA example

System FMEA

Let‘s consider a system that uses vision (image/video) signals for various functions it provides.

Such a system must be calibrated. Let’s consider a component that can fail due to failure in calibration and see different failure modes.

Based on the effects listed above, we can allocate severity ratings for each failure mode and derive the cause of each effect.

Effect Severity[s] Cause
Incorrect function / loss of sub function

 

9 •       Logical failure
No effect on function/sub-function

 

1 •       Communication failure of the image processor

•       Logical failure

Permanent Incorrect calibration/loss of sub-functions

 

9 •       Communication failure of the image processor

•       Logical failure

•       Sensor error

 

Once the causes are listed, their probability of occurrence can be quantified.

Cause Occurrence[O] Cause
•       Logical failure 2 •       Logical failure
•       Communication failure of the image processor

•       Logical failure

4 •       Communication failure of the image processor

•       Logical failure

•       Communication failure of the image processor

•       Logical failure

•       Sensor error

4 •       Communication failure of the image processor

•       Logical failure

•       Sensor error

 

For the above failure, based on the current detection mechanism the probability of detection shall be quantified.

Current Detection mechanism Detection[D]
•       Supplier DFMEA

•       Design review

•       Design compliant with ISO 26262

3
•       Supplier DFMEA

•       Design review

•       Design compliant with ISO 26262

3
•       Supplier DFMEA

•       Design review

•       Design compliant with ISO 26262

3

 

Based on the above ratings RPN for all three failure modes shall be calculated as follows:

Failure mode [S] [O] [D] RPN
No Calibration

 

9 2 3 54
Delayed calibration 1 4 3 12
Incorrect Calibration 9 4 3 108

 

Due to diversity in vehicle development across the globe, OEMs and Suppliers had been using different methods for performing FMEA. Suppliers providing products to German and North American OEMs have to follow different rating charts for Severity, Occurrence, and Detection. Hence, their RPN chart also differs. This difference seems to be simple but it brings confusion in product development.

Two different bodies which standardize the FMEA process are: (Reference: https://vda-qmc.de/publikationen/)

AIAG – Automotive Industry Action group. This body provides guidelines for suppliers providing products to North American OEMs like Fiat Chrysler Automotive, Ford Motor Company, General Motors, and more.

VDA – This body provides guidelines for suppliers providing products to North American OEMs like Ford Europe, Audi, Volkswagen AG, and more.

To bridge the difference in the process AIAG and VDA have aligned their processes for performing FMEA. The Major highlights of this alignment are:

  • Classifying the FMEA process as 6 steps approach.
  • Defining ranking chart (severity, occurrence, and detection)
  • Replacing RPN by Action Priority (AP)

 

Following are the 6 steps which are recognized by AIAG and VDA for FMEA creation:

  • Scope definition: This step covers project identification, project planning, analyzing and defining system boundaries, identification of supporting information such as lessons learned from other projects, prerequisites for structure analysis step.
  • Structure Analysis: This step covers the identification of the system structure of a component or process. A block diagram, a boundary diagram, or a process flow diagram can be the base for structure analysis. The main objective of this step is to identify interfaces and interactions of system and sub-system components. The end product of this step is considered a prerequisite for the Function analysis step.
  • Function Analysis: This step helps in identifying the functionality of a component or a process. A function tree diagram, a function matrix parameter diagram, or a process flow diagram can be the base for functional analysis. Through such function diagrams, individual functionalities and their related safety requirements are recognized. The end product of this step is considered as a prerequisite for Failure analysis step.
  • Failure Analysis: In this step, failure modes, potential failure effect, and cause of failure are identified. The system is now structured in the form of failure nets (using a qualified tool) or in a template such as the FMEA worksheet. Also, a failure structure is crated with failures linked to each other. The end product of this step is considered as a prerequisite for Risk analysis step.
  • Risk Analysis: In this step, we analyze the effect of failure and prevention/detection mechanisms in place. It includes mapping of prevention and detection mechanisms to identified failure modes. It is in this step where we quantify the identified failure in assigning severity, occurrence, and detection rating to it. Also, this is the step where we use the refined rating charts for the analysis. The severity rating assigned to the identified failure mode is collaborated with Customer (OEM) and the supplier’s severity rating and action priority (explained below) are defined. The end product of this step is considered a prerequisite for the process optimization step.
  • Process Optimization: This step identifies actions for risk reduction; it includes confirmation measures for the implementation of actions, assignment of responsibilities, and documentation. It also includes risk assessment after prevention/detection action is taken. Continuous improvement is also part of this step.

 

Following is the Action Priority (AP) Chart for Design FMEA: (Reference: https://vda-qmc.de/publikationen/)

S O D AP Rationale
9-10 6-10 1-10 H High priority due to safety and /or regulatory effects that have a high or very high occurrence rating.
9-10 4-5 7-10 H High priority due to safety and /or regulatory effects that have a moderate occurrence rating and a high detection rating.
5-8 4-5 5-6 H High priority due to loss or degradation of an essential or convenience vehicle function that has a moderate occurrence rating and moderate detection rating.
5-8 4-5 1-4 M Medium priority due to the loss or degradation of an essential or convenience vehicle function that has a moderate occurrence and low detection rating.
2-4 4-5 5-6 M Medium priority due to perceived quality (appearance, sound, haptics) with a moderate occurrence and moderate detection rating.
2-4 4-5 1-4 L Low priority due to perceived quality (appearance, sound, haptics) with a moderate occurrence and low detection rating.
1 1-10 1-10 L Low priority due to no discernible effect

 

17 Jun 2020

Hazard Analysis and Risk Assessment

Why do we need HARA?

Imagine that you are driving on a highway. You hit your brakes and they don’t work when pressed, your vehicle might crash with another object/vehicle creating an accident!

None of us would wish that something like this happens. Such events end up causing a lot of harm to people as well as the environment.

But how do we make sure that something like this does not happen? How do we, as Safety engineers ensure that the vehicle is functionally safe and if any such failures occur, what do we do to prevent such Mishaps?

Functional Safety ISO-26262 brings the guidelines to follow the techniques, and methods to implement Safety in the Automotive ECUs.

HARA is an important functional safety work product that sets the pace for creating top-level safety requirements called Safety Goals.

What is HARA?

HARA refers to Hazard Analysis and Risk Assessment.

Hazard Analysis:

Hazard analysis is used as the first step in a process used to assess Risk as ASIL. The goal of hazard analysis is to determine the ASIL level and required Safe State.

Risk Assessment:

The Risk Assessment involves the following:

Hazard Identification: Identify hazards and risk factors that have the potential to cause harm.

Risk analysis and evaluation: Analyze and evaluate the risk associated with that hazardous Risk by S, E, and C

 

HARA is the work product from the Concept phase i.e. Part 3 of the ISO26262 V Cycle. A detailed description of HARA is available in this part of the standard.

 

Prerequisite/ Input to HARA – Item/System Definition.

Output from HARA – Safety Goals/Top Level Safety Requirements.

 What is going to be inside the Item/System Definition?

The Item definition contains a detailed description of the System Functionalities along with the Preliminary Architecture, Dependencies, and the Interaction of the system with the driver. It also contains a description of how the system interacts with the environment and other items at the vehicle level.

Also besides, it contains details about all the Functions and Subfunctions involved, the scope and boundary of the system, and description of any input or output components that are included in the scope.

 

How to Perform Hazard Analysis and Risk Assessment?

 Leaping into the HARA Methodology:

The HARA Methodology includes the following steps:

 

Situation Analysis:

Operating Mode: In this part, we will be providing information about the operating modes that we choose to select for all the functions under every feature based upon the relevance to the system that we are working on. This information is entirely system dependant

The different operating modes may include: ON/OFF/FAIL/STANDBY/ACTIVE/INACTIVE/DEGRADED and also some system-specific operating modes.

Operating Situation: In this part, we provide information about the situation in which the Vehicle is in.

The Situations may include: Driving Location, Road Conditions, Driving Conditions, Vehicle State, Usage, Driver Attention Level, and any other special situation. Consider whichever is required.

Environmental Conditions: In this part, we provide information about the weather conditions, Visibility, and other driving road conditions.

By considering the operating mode, operating situation, and environmental conditions, the Scenarios are created by merging them into different combinations.

Determination of Malfunction:

For all the features of the system, by considering all possible hazards that are defined by Hazard and Operability guidelines, the possible malfunctions are created.

The HAZOP Guidewords are: No, More, Less, As Well As, Part of, Reverse, Other than, Early, Late, Before, After, etc.

No This is a complete negation of the design intention. No part of the intention is achieved and nothing else happens
More There is a quantitative increase
Less There is a quantitative decrease
As well as All design intention is achieved with other additions
Part of Only some of the design intentions are achieved
Reverse The logical opposite of the intention is achieved
Other than Complete substitution where no part of the original intention is achieved but something quite different happens
Early Happens early to the expected relative to clock time
Late Happens later than expected relative to clock time
Before Something happens before it is supposed to happen

 

With the above-acquired Scenarios and Malfunctions, the information about all possible hazards, the hazards impacting on vehicle level, system level, and the Worst case Mishap will be described.

Classification of hazardous events:

The three important Metrics for the classification are Severity, Exposure, and Controllability.

àSeverity scale ranges from S0 to S3 based on the following table:

Class S0 S1 S2 S3
ISO 26262 Reference No injuries light and moderate injuries Severe injuries, possibly life-threatening, survival probable Life-threatening injuries (survival uncertain) or fatal injuries
Reference injuries from AIS scale
(IEC 61508 usage)
AIS 0 and less than 10 %
probability of AIS 1-6;
more than 10% probability of AIS 1-6 (and
not S2 or S3)
more than 10% probability of AIS 3-6
(and not S3)
more than 10% probability of AIS 5-6

 

Exposure scale ranges from E0 to E4 based on the following table:

Class E0 E1 E2 E3 E4
ISO 26262 Reference Incredible Very low probability Low probability Medium probability High probability
Reference Description Not Specified Situations that occur less often than once a year for the majority of drivers < 1% of average operating time or situations that occur a few times a year for the great majority of drivers 1% – 10% of average operating time or situations that occur once a month or more often for an average driver > 10% of average operating time or situations that occur during almost every drive cycle on average

*Note Exposure wrt Frequency and Duration shall be considered

 

Controllability scale ranges from C0 to C3 based on the following table:

Class C0 C1 C2 C3
ISO 26262 Reference Controllable in general Simply controllable Normally controllable Difficult to control or uncontrollable
Reference Description Full ability to maintain the intended driving path 99% or more of all drivers or other traffic participants are usually able to avoid specified harm – Ability to brake & steer to slow or stop the vehicle 90% or more of all drivers or other traffic participants are usually able to avoid a specified harm Less than 90% of all drivers or other traffic participants are usually able, or barely able, to avoid a specified harm

 

ASIL stands for Automotive Safety Integrity Level. Calculation of the ASIL is done based on the Severity, Exposure, and Controllability values which are classified based on the following table:

Severity Exposure Controllability
C0 C1 C2 C3
S0 E1 QM QM QM QM
E2 QM QM QM QM
E3 QM QM QM QM
E4 QM QM QM A
S1 E1 QM QM QM QM
E2 QM QM QM QM
E3 QM QM QM A
E4 QM QM A B
S2 E1 QM QM QM QM
E2 QM QM QM A
E3 QM QM A B
E4 QM A B C
S3 E1 QM QM QM A
E2 QM QM A B
E3 QM A B C
E4 A B C D

 

 

 Determination of Safety Goals and the Safe States:

The Safety Goals are determined for the above analysis. The Safety Goals will be the Negation of the Malfunctions. The Safe States are also described. A safe state is a state to which the system reaches before any Mishap occurs so that it prevents the Mishap.

 

What is the Output of HARA?

The Safety Goals or Top level Safety Requirements are the output of the HARA.

Now with these Safety Goals in hand, we can begin with the functional safety activities.

These Safety Goals will be further carried over to continue with Functional Safety Requirements, Technical Safety Requirements, and further to the respective decompositions or functions or domains.

Example:

Function: Automatic Emergency Braking

Malfunction: Unintended Emergency Braking

Hazard: Rear Collision

Safety Goal: Avoid Unintended Emergency Braking