Military & Aerospace: Inconvenient reality: when failure is not an option

Doug Patterson explains why stringent testing and qualification procedures down to the very component are essential to ensuring operational reliability in military and aerospace systems

Most embedded aerospace and military systems perform critical functions related to operation of the larger system or platform. Performance, reliability and functionality are imperative, and embedded systems must operate flawlessly in very specific and defined ways, while exposed to extreme environments. Component qualification plays a pivotal role in the resulting reliability.
Failure is not an option
Military and aerospace equipment requirements are generally far more stringent than those found in consumer, automotive, industrial and similar applications. In most cases, anything less than 100% operation is not an option, since even the slightest failure could have catastrophic and deadly results. Reliability is critical.
Yet military and aerospace applications typically encompass a number of the harshest environments: extreme high and low temperature ranges – from the intense cold of the Antarctic to the world's hottest deserts; the highest levels of shock and vibration – from explosive detonation to the rigours of a space launch; high atmospheric humidity and even total immersion; resistance to nuclear and solar radiation as well as the total vacuum of space; and the high pressures of the deepest oceans.
This puts an enormous burden on the components and systems used in these environments, since they need to strike a balance between being always available to meet safety-critical standards and operating under the extreme conditions of these intense applications. Component testing, characterisation and validation are mandatory in developing reliable embedded computing system.
Dependable components
Unlike their commercial or industrial counterparts, military products are specifically designed to exacting specifications and the tightest tolerances from the outset. For example, electronic and mechanical components are first selected for extended environmental operation, such as temperature, vibration, shock and humidity.
For example, if military temperature grade components (-55 to +125˚C) are not available, then industrial temperature range devices (-40 to +85˚C) are selected. If – and only if – these two temperature grades are not available, then commercial temperature range (0 to +70˚C) devices are used, and rigorously inspected and pre-screened to the levels needed to ensure reliable product operation in the intended application during the system's mission life cycle.
For high humidity operation, plastic encapsulated devices must be held and stored in a humidity-controlled storage cabinet prior to board assembly to reduce moisture entry into the devices. A specifically chosen conformal coating is used after board assembly to reduce susceptibility to moisture during the system's deployment.
Then, depending on the grade and history of the components chosen, many are subjected to additional mechanical testing, such as package tolerances and hermeticity properties, PIND (particle impact noise detection) and lead plating and bonding, as well as temperature and up-screening tests, especially for commercial components that may be operating outside of their stated standard temperature range. Sometimes radiation screening is part of the equation. This entire pre-screening process, known as guard-banding, ensures all the components operate well within the expected and tested parameters during deployment.
The physics of defence and aerospace end-user applications have not changed in the past 100 years, and systems subjected to the harshest environments are expected to operate the first time, every time. In light of today's diminishing sources of military-grade components, up-screening remains one of the more reliable processes to pre-qualify electronics prior to end-use – in fact, these processes have become mandatory to meet the platform environmental specifications.
Qualified or guaranteed?
Up-screening at the component level remains a viable process approach if individual components and mechanical devices are 100% inspected, measured, rated and sorted based on real data collected in real-time, and not rated based on statistically generated data from some pre-decided levels of expected yield probabilities. If premature failures are seen early during the manufacturing process, failure and risk mitigation steps must be taken to ensure higher yields at the next assembly, and then tighter process controls, such as 100% component inspection, including pre-screening, should be put in place to reduce the chance that the failure repeats in the future.
Electronic boards and modules using individually pre-screened components are effectively and operationally guaranteed by design to produce a reliable yield during manufacture and, ultimately, reliably pass the demanding rigours of the end-use applications.
A complete platform is essentially a system of systems. It is only as dependable as its least reliable (non-redundant) component in an event chain. Unfortunately, the indiscriminate usage of up-screening has taken this once reliable methodology into areas where it can potentially no longer guarantee success. Using off-the-shelf, untested and uncharacterised components from brokers or other uncontrolled sources and up-screening only at higher levels of integration – such as at a board level or subsystem only – can effectively mask individual components operating at levels beyond their guard-band – meaning a module is destined for early failure. Modules using this late screening process are only qualified by test, meaning there are no process controls to guarantee future yield.
Without properly characterising a board's individual components, an up-screen board may pass some preset levels of HASS (highly accelerated stress screening) testing one day with flying colours and 100% yield, and then miserably fail the next day. In most instances, no one knows what or why. Is this the degree of confidence end-users are looking for in safety-critical applications?
Temperature rising
Generated heat can be a dangerous source of failure in any system or subsystem – remember that temperature and reliability are inversely proportional. The potential for extreme temperatures and rapid, exaggerated, constant temperature cycling ranges combined with the extreme cost of an irreparable catastrophic failure within a critical application requires close consideration.
As boards and systems become more densely configured, heat generating characteristics of on-board electronics can create numerous problems at much lower temperature swings if a system is unable to dissipate heat aggressively from the active devices. Subsystem reliability approximately decreases by half with just a 10˚C rise in temperature in today's complex system-on-a-board embedded designs. These extremes are normal occurrences in products typically employed in military and aerospace applications, making thermal management a large consideration to ensure reliability.
Recurring heating and cooling cycles put severe mechanical stresses on components, threatening long-term reliability. Matching the thermal coefficients of expansion (TCEs) for components and the printed wiring boards on which they are mounted cuts down on the risk of having boards and components with significantly different TCEs, as this can cause adjacent portions to contract and expand at different rates, resulting in heat-related electrical and mechanical failures.
Semiconductor considerations
Because excessive heat can exacerbate semiconductor package deterioration as well, leading to premature system failure, managing temperature extremes is critical to ensuring reliability. Two common conditions of deterioration directly affected by inadequate thermal management are metal electromigration and electrostatic discharge.
One of the most common failure modes in modern metal oxide substrate (mos) semiconductors, metal electromigration occurs when a chain of metallic molecules forms that can bridge thin insulating oxide layers and cause internal shorts.
Advances in semiconductor designs, such as higher line densities that decrease width geometries and increase device functionality, greatly impact metal electromigration. These higher line densities generate a larger current density (charge per unit volume), increasing the resulting electromagnetic field (EMF). Over time, increases in the mos devices' EMF can induce metal ions from the metallisation lines within the semiconductor to move or migrate, leading to the short.
We've all received a shock in our daily lives, sometimes even from a piece of electronic equipment, referred to as electrostatic discharge (ESD). While ESD in a computing system tends to mirror the larger, and recognisable, single discharge event (shock) we commonly experience, there are smaller static discharges within a computing system that can partially damage the oxide layer and cause dormant or hidden (latent) defects. If not detected, mitigated or circumvented, these latent defects will cause premature failure in deployed systems. Continuous application of an EMF across a damaged, pitted insulating layer, coupled with higher device die temperatures, will accelerate metal ion migration across the partially failed insulation layers, resulting in premature system failure.
Qualifying
The products used in military, aerospace and astronautic applications are expected to perform flawlessly for extended periods with little or no maintenance – especially during times of mission-critical operation. The users of these electronic subsystems therefore mandate the immediate, highly reliable and easily maintainable operation of this equipment when deployed in the most severe environments and applications – from air, land, sea and space. Stringent testing and qualification procedures down to the very component are essential to ensuring that operational reliability.
Doug Patterson is vice president of worldwide sales and marketing for Aitech Defense Systems

28 July 2010, Aitech Defense Systems