Assessing Product Reliability_2

pdf
Số trang Assessing Product Reliability_2 16 Cỡ tệp Assessing Product Reliability_2 974 KB Lượt tải Assessing Product Reliability_2 0 Lượt đọc Assessing Product Reliability_2 0
Đánh giá Assessing Product Reliability_2
4.9 ( 11 lượt)
Nhấn vào bên dưới để tải tài liệu
Đang xem trước 10 trên tổng 16 trang, để tải xuống xem đầy đủ hãy nhấn vào bên trên
Chủ đề liên quan

Nội dung

8.1.2.2. Reliability or survival function 8. Assessing Product Reliability 8.1. Introduction 8.1.2. What are the basic terms and models used for reliability evaluation? 8.1.2.2. Reliability or survival function Survival is the complementary event to failure The Reliability FunctionR(t), also known as the Survival Function S(t), is defined by: R(t) = S(t) = the probability a unit survives beyond time t. Since a unit either fails, or survives, and one of these two mutually exclusive alternatives must occur, we have R(t) = 1 - F(t), F(t) = 1 - R(t) Calculations using R(t) often occur when building up from single components to subsystems with many components. For example, if one microprocessor comes from a population with reliability function Rm(t) and two of them are used for the CPU in a system, then the system CPU has a reliability function given by Rcpu(t) = Rm2(t) The reliability of the system is the product of the reliability functions of the components since both must survive in order for the system to survive. This building up to the system from the individual components will be discussed in detail when we look at the "Bottom-Up" method. The general rule is: to calculate the reliability of a system of independent components, multiply the reliability functions of all the components together. http://www.itl.nist.gov/div898/handbook/apr/section1/apr122.htm [5/1/2006 10:41:24 AM] 8.1.2.3. Failure (or hazard) rate 8. Assessing Product Reliability 8.1. Introduction 8.1.2. What are the basic terms and models used for reliability evaluation? 8.1.2.3. Failure (or hazard) rate The failure rate is the rate at which the population survivors at any given instant are "falling over the cliff" The failure rate is defined for non repairable populations as the (instantaneous) rate of failure for the survivors to time t during the next instant of time. It is a rate per unit of time similar in meaning to reading a car speedometer at a particular instant and seeing 45 mph. The next instant the failure rate may change and the units that have already failed play no further role since only the survivors count. The failure rate (or hazard rate) is denoted by h(t) and calculated from The failure rate is sometimes called a "conditional failure rate" since the denominator 1 - F(t) (i.e., the population survivors) converts the expression into a conditional rate, given survival past time t. Since h(t) is also equal to the negative of the derivative of ln{R(t)}, we have the useful identity: If we let be the Cumulative Hazard Function, we then have F(t) = 1 - e-H(t). Two other useful identities that follow from these formulas are: http://www.itl.nist.gov/div898/handbook/apr/section1/apr123.htm (1 of 2) [5/1/2006 10:41:25 AM] 8.1.2.3. Failure (or hazard) rate It is also sometimes useful to define an average failure rate over any interval (T1, T2) that "averages" the failure rate over that interval. This rate, denoted by AFR(T1,T2), is a single number that can be used as a specification or target for the population failure rate over that interval. If T1 is 0, it is dropped from the expression. Thus, for example, AFR(40,000) would be the average failure rate for the population over the first 40,000 hours of operation. The formulas for calculating AFR's are: http://www.itl.nist.gov/div898/handbook/apr/section1/apr123.htm (2 of 2) [5/1/2006 10:41:25 AM] 8.1.2.4. "Bathtub" curve 8. Assessing Product Reliability 8.1. Introduction 8.1.2. What are the basic terms and models used for reliability evaluation? 8.1.2.4. "Bathtub" curve A plot of the failure rate over time for most products yields a curve that looks like a drawing of a bathtub If enough units from a given population are observed operating and failing over time, it is relatively easy to compute week-by-week (or month-by-month) estimates of the failure rate h(t). For example, if N12 units survive to start the 13th month of life and r13 of them fail during the next month (or 720 hours) of life, then a simple empirical estimate of h(t) averaged across the 13th month of life (or between 8640 hours and 9360 hours of age), is given by (r13 / N12 * 720). Similar estimates are discussed in detail in the section on Empirical Model Fitting. Over many years, and across a wide variety of mechanical and electronic components and systems, people have calculated empirical population failure rates as units age over time and repeatedly obtained a graph such as shown below. Because of the shape of this failure rate curve, it has become widely known as the "Bathtub" curve. The initial region that begins at time zero when a customer first begins to use the product is characterized by a high but rapidly decreasing failure rate. This region is known as the Early Failure Period (also referred to as Infant Mortality Period, from the actuarial origins of the first bathtub curve plots). This decreasing failure rate typically lasts several weeks to a few months. Next, the failure rate levels off and remains roughly constant for (hopefully) the majority of the useful life of the product. This long period of a level failure rate is known as the Intrinsic Failure Period (also called the Stable Failure Period) and the constant failure rate level is called the Intrinsic Failure Rate. Note that most systems spend most of their lifetimes operating in this flat portion of the bathtub curve Finally, if units from the population remain in use long enough, the failure rate begins to increase as materials wear out and degradation failures occur at an ever increasing rate. This is the Wearout Failure Period. http://www.itl.nist.gov/div898/handbook/apr/section1/apr124.htm (1 of 2) [5/1/2006 10:41:25 AM] 8.1.2.4. "Bathtub" curve NOTE: The Bathtub Curve also applies (based on much empirical evidence) to Repairable Systems. In this case, the vertical axis is the Repair Rate or the Rate of Occurrence of Failures (ROCOF). http://www.itl.nist.gov/div898/handbook/apr/section1/apr124.htm (2 of 2) [5/1/2006 10:41:25 AM] 8.1.2.5. Repair rate or ROCOF 8. Assessing Product Reliability 8.1. Introduction 8.1.2. What are the basic terms and models used for reliability evaluation? 8.1.2.5. Repair rate or ROCOF Repair Rate models are based on counting the cumulative number of failures over time A different approach is used for modeling the rate of occurrence of failure incidences for a repairable system. In this chapter, these rates are called repair rates (not to be confused with the length of time for a repair, which is not discussed in this chapter). Time is measured by system power-on-hours from initial turn-on at time zero, to the end of system life. Failures occur at given system ages and the system is repaired to a state that may be the same as new, or better, or worse. The frequency of repairs may be increasing, decreasing, or staying at a roughly constant rate. Let N(t) be a counting function that keeps track of the cumulative number of failures a given system has had from time zero to time t. N(t) is a step function that jumps up one every time a failure occurs and stays at the new level until the next failure. Every system will have its own observed N(t) function over time. If we observed the N(t) curves for a large number of similar systems and "averaged" these curves, we would have an estimate of M(t) = the expected number (average number) of cumulative failures by time t for these systems. The Repair Rate (or ROCOF) is the mean rate of failures per unit time The derivative of M(t), denoted m(t), is defined to be the Repair Rate or the Rate Of Occurrence Of Failures at Time t or ROCOF. Models for N(t), M(t) and m(t) will be described in the section on Repair Rate Models. http://www.itl.nist.gov/div898/handbook/apr/section1/apr125.htm [5/1/2006 10:41:25 AM] 8.1.3. What are some common difficulties with reliability data and how are they overcome? 8. Assessing Product Reliability 8.1. Introduction 8.1.3. What are some common difficulties with reliability data and how are they overcome? The Paradox of Reliability Analysis: The more reliable a product is, the harder it is to get the failure data needed to "prove" it is reliable! There are two closely related problems that are typical with reliability data and not common with most other forms of statistical data. These are: ● Censoring (when the observation period ends, not all units have failed - some are survivors) ● Lack of Failures (if there is too much censoring, even though a large number of units may be under observation, the information in the data is limited due to the lack of actual failures) These problems cause considerable practical difficulty when planning reliability assessment tests and analyzing failure data. Some solutions are discussed in the next two sections. Typically, the solutions involve making additional assumptions and using complicated models. http://www.itl.nist.gov/div898/handbook/apr/section1/apr13.htm [5/1/2006 10:41:25 AM] 8.1.3.1. Censoring 8. Assessing Product Reliability 8.1. Introduction 8.1.3. What are some common difficulties with reliability data and how are they overcome? 8.1.3.1. Censoring When not all units on test fail we have censored data Consider a situation in which we are reliability testing n (non repairable) units taken randomly from a population. We are investigating the population to determine if its failure rate is acceptable. In the typical test scenario, we have a fixed time T to run the units to see if they survive or fail. The data obtained are called Censored Type I data. Censored Type I Data During the T hours of test we observe r failures (where r can be any number from 0 to n). The (exact) failure times are t1, t2, ..., tr and there are (n - r) units that survived the entire T-hour test without failing. Note that T is fixed in advance and r is random, since we don't know how many failures will occur until the test is run. Note also that we assume the exact times of failure are recorded when there are failures. This type of censoring is also called "right censored" data since the times of failure to the right (i.e., larger than T) are missing. Another (much less common) way to test is to decide in advance that you want to see exactly r failure times and then test until they occur. For example, you might put 100 units on test and decide you want to see at least half of them fail. Then r = 50, but T is unknown until the 50th fail occurs. This is called Censored Type II data. Censored Type II Data We observe t1, t2, ..., tr, where r is specified in advance. The test ends at time T = tr, and (n-r) units have survived. Again we assume it is possible to observe the exact time of failure for failed units. Type II censoring has the significant advantage that you know in advance how many failure times your test will yield - this helps enormously when planning adequate tests. However, an open-ended random test time is generally impractical from a management point of view and this type of testing is rarely seen. http://www.itl.nist.gov/div898/handbook/apr/section1/apr131.htm (1 of 2) [5/1/2006 10:41:26 AM] 8.1.3.1. Censoring Sometimes we don't even know the exact time of failure Readout or Interval Data Sometimes exact times of failure are not known; only an interval of time in which the failure occurred is recorded. This kind of data is called Readout or Interval data and the situation is shown in the figure below: . Multicensored Data In the most general case, every unit observed yields exactly one of the following three types of information: ● a run-time if the unit did not fail while under observation ● an exact failure time ● an interval of time during which the unit failed. The units may all have different run-times and/or readout intervals. Many special methods have been developed to handle censored data How do we handle censored data? Many statistical methods can be used to fit models and estimate failure rates, even with censored data. In later sections we will discuss the Kaplan-Meier approach, Probability Plotting, Hazard Plotting, Graphical Estimation, and Maximum Likelihood Estimation. Separating out Failure Modes Note that when a data set consists of failure times that can be sorted into several different failure modes, it is possible (and often necessary) to analyze and model each mode separately. Consider all failures due to modes other than the one being analyzed as censoring times, with the censored run-time equal to the time it failed due to the different (independent) failure mode. This is discussed further in the competing risk section and later analysis sections. http://www.itl.nist.gov/div898/handbook/apr/section1/apr131.htm (2 of 2) [5/1/2006 10:41:26 AM] 8.1.3.2. Lack of failures 8. Assessing Product Reliability 8.1. Introduction 8.1.3. What are some common difficulties with reliability data and how are they overcome? 8.1.3.2. Lack of failures Failure data is needed to accurately assess and improve reliability this poses problems when testing highly reliable parts When fitting models and estimating failure rates from reliability data, the precision of the estimates (as measured by the width of the confidence intervals) tends to vary inversely with the square root of the number of failures observed - not the number of units on test or the length of the test. In other words, a test where 5 fail out of a total of 10 on test gives more information than a test with 1000 units but only 2 failures. Testing at much higher than typical stresses can yield failures but models are then needed to relate these back to use stress How can tests be designed to overcome an expected lack of failures? Since the number of failures r is critical, and not the sample size n on test, it becomes increasingly difficult to assess the failure rates of highly reliable components. Parts like memory chips, that in typical use have failure rates measured in parts per million per thousand hours, will have few or no failures when tested for reasonable time periods with affordable sample sizes. This gives little or no information for accomplishing the two primary purposes of reliability testing, namely: ● accurately assessing population failure rates ● obtaining failure mode information to feedback for product improvement. The answer is to make failures occur by testing at much higher stresses than the units would normally see in their intended application. This creates a new problem: how can these failures at higher-than-normal stresses be related to what would be expected to happen over the course of many years at normal use stresses? The models that relate high stress reliability to normal use reliability are called acceleration models. http://www.itl.nist.gov/div898/handbook/apr/section1/apr132.htm (1 of 2) [5/1/2006 10:41:26 AM]
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.