The following is an excerpt from The Reliability Engineering Handbook by Bryan Dodson and Dennis Nolan, copyright QA Publishing, LLC
Determining what information to collect
Deciding what data to collect will depend on the phase of the project: the conceptual, design, production, or maintenance phase. In any case, data should include failures due to equipment failure and human error. The conceptual phase will require the use of data from similar products. The design phase will require research or actual test data for the specific product. The production phase requires the use of a more historical type data derived sometimes from the design stages. The maintenance phase requires the use of actual failure data that may have been acquired with various failure analysis techniques. In short, all failures must be included from development to acceptance.
Five basic steps are outlined below that will help determine what data to collect:
1. Find out what happened, and be as specific as possible. At what level in the overall system, product, or process was the event discovered?
2. Method of detection. Internally? Externally?
3. Find out when the event happened. During testing? During production run?
4. Find out if there is a similar event in historical records. If the answer is "yes," it could save time by eliminating some data collection.
5. Find out if there have been any recent changes. Check vendor materials, test conditions, etc.
How will data be collected and reported
Data may be collected by either manual or automatic means. Most test results or observations are recorded manually on forms customized to collect specific information then input into a computer database. Data is sometimes taken automatically through the use of electronic devices that send information directly to a database.
The automatic data information gathering technique is usually desirable where continuous monitoring is necessary.
There are no standards for how to record or store data. When data is input into a computer, manually or automatically, both retrieval and use become obviously enhanced. There are many software packages on the market that can be readily tailored to fit specific needs for data analysis and reporting.
Who will collect the information
Deciding who will collect the information depends on who will use the data, the accuracy needed, and time and cost constraints. Keep in mind that the person who collects data is not necessarily the one who will use it.
What level of accuracy is needed
Accuracy will depend on the product and its intended use. For example, a cook may only need to take time and temperature data at so many minutes and degrees, while a race car designer may want time and temperature in tenths of seconds and degrees. For another example, if someone is asked their age 10 days before their 40th birthday, they may reply 39 or 40. Which is more accurate? Which is accurate enough? It could be important enough to require an answer like: 39 years, 355 days, 12 hours, and 15 minutes. Of course, estimating age usually will not require that much detail, but when asking how long a verified equipment problem has persisted, details do become important.
The program outlined below will help assure that accurate and complete data is collected which meets the objectives for data collecting, identifying, reporting, verifying, analyzing, and correcting problems.
1. Identify and control failed items.
A tag should be affixed to the failed item immediately upon the detection of any failure or suspected failure. The failure tag should provide space for the failure report serial number and for other pertinent entries from the item failure record. All failed parts should be marked conspicuously and controlled to ensure disposal in accordance with all laws, rules or regulations. Failed parts should not be handled in any manner that may obliterate facts which might be pertinent to the analysis and stored pending disposition of the failure analysis agent.
2. Reporting of problems or failures.
A failure report should be initiated at the occurrence of each problem or failure of hardware, software, or equipment. The report should contain the information required to permit determination of the origin and correction of failures. The following information should be included in the report:
a. Descriptions of failure symptoms, conditions surrounding the failure, failed hardware identification, and operating time (or cycles) at time of failure
b. Information on each independent and dependent failure and the extent of confirmation of the failure symptoms, the identification of failure modes, and a description of all repair action taken to return the item to operational readiness
c. Information describing the results of the investigation, the analysis of all part failures, an analysis of the item design, and the corrective-action taken to prevent failure recurrence (if no corrective-action is taken, the rationale for this decision should be recorded)
3. Verify failures.
Reported failures should be verified as actual failures or an acceptable explanation provided for lack of failure verification. Failure verification is determined either by repeating the failure mode on the reported item or by physical or electrical evidence of failure (leakage residue, damaged hardware, etc.) Lack of failure verification, by itself, is not sufficient rationale to conclude the absence of a failure.
4. Investigation and analysis of problems or failures.
An investigation and analysis of each reported failure should be performed. Investigation and analysis should be conducted to the level of hardware or software necessary to identify causes, mechanisms, and potential effects of the failure. Any applicable method (test, microscopic analysis, applications study, dissection, x-ray analysis, spectrographic analysis, etc.) of investigation and analysis that may be needed to determine failure cause shall be used. When the removed item is not defective or the cause of failure is external to the item, the analysis should be extended to include the circuit, higher hardware assembly, test procedures, and subsystem if necessary. Investigation and analysis of supplier failures should be limited to verifying that the supplier failure was not the result of their hardware, software, or procedures. This determination should be documented for notification of the procuring activities.
5. Corrective-action and follow-up.
When the cause of a failure has been determined, a corrective-action shall be developed to eliminate or reduce the recurrence of the failure. The procuring activity should review the corrective-actions at scheduled status reviews prior to implementation. In all cases, the failure analysis and the resulting corrective-actions should be documented. The effectiveness of the corrective-action should be demonstrated by restarting the test at the beginning of the cycle in which the original failure occurred.
Who will use the information
Deciding who will use the data is probably of less concern than what data to use. Usually everyone involved in the project will use some portion of the information. Assuring that collected information is available to all through data analysis and reporting, and making it easily accessible, is the key.