I am working with a client to help on a diagnosing a quality issue with a manufactured product. They provided me with the typical information you might see from an organization that does not really understand field service failure analysis.
I received data listing the number of units shipped each month and how many were reported failed (returns from customers) each month with a defect or failure rate that was plotted against time. It probably feels real good to plot the defective rate, but this is a prime example of a poorly constructed metric that the leadership likes because it sounds like a quality metric. Why not use and plot it, since it is so easy to calculate.
The problem is that the numerator (units reported failed) were all shipped in prior months while the denominator was shipped in the current month (non of them have been reported failed) It was useless data. Now when a traditional defective rate is plotted it is the number found defective from all that are inspected in a given period. This makes sense, since the numerator and denominator were all created under the same conditions and material lots.
What can they do now? They were able to provide the number that has failed as a function of time in service. This is good data as long as we know how many were put in service that have not yet failed. Given this data we can make an estimate of the failure modes and are able to predict the return rates.
A better set of data would be a data set that has one row for every failure event that also includes the date in service, the date failed, customer name, failure mode (if known), any mfg data like lot number. With this data and the number of units produced each day (week or other period) we can not only truly evaluate the failure modes and changes over time. This is the basis for a reliability evaluation.
Normal quality professionals would treat this as a defect issue. They would examine the failures and then do cause analysis and then act upon the failures. This is fine when you are using a final inspection of a batch, where you have both good and bad in a population to compare. When you are performing failure analysis on field returns, you do not have any access to products that were created at the same time that have no failures. In these cases you often are lead on wild goose chase, because there is so much not known with the failures. How they were treated, their environment, amount of usage… and how many good units went through the same conditions.
Working a reliability problem is beyond most quality professionals for these reasons, also they may not understand the statistics of censored data analysis that is required to dig into the problem.