$NVDA $AVGO $ANET $LITE The post argues that the economics and thermals of pluggable PAM4 DSPs at 400G/800G have become sufficiently adverse that hyperscalers are incentivized to remove the DSP from the optics data path. Two architectural responses are highlighted. Linear pluggable optics (LPO) keeps the pluggable form factor but deletes the DSP, relying on the host switch’s SerDes to directly drive the optics. Co‑packaged optics (CPO) goes further by moving the optical engine onto the same package as the switch or compute ASIC. By collapsing the host‑to‑optics electrical distance, CPO eliminates the need for LR SerDes and the pluggable DSP, enabling use of short‑reach or clock‑forwarded D2D SerDes. This shift increases chip shoreline density and reduces ASIC PHY power and area budgets while also removing multiple electrical connectors and fibers at the faceplate.
Meta’s reliability data are used to substantiate that CPO is operationally viable and materially more reliable than pluggables. In testing of Tomahawk 5‑based “Bailly” CPO switches, 15,000,000 cumulative port‑hours were recorded for CPO systems and 2,000,000 for pluggable controls. Reported annual link failure rate (ALFR) for CPO was 0.34% with MTBF of 2,600,000 hours, versus pluggable baselines around 0.94%–1.58% ALFR and 550,000–930,000 hours MTBF depending on environment. The implied improvement factor is approximately 2.8x–4.7x on both ALFR and MTBF. Zero unserviceable CPO failures were observed across 15,000,000 port‑hours. On a simple “1/T” basis that places an upper bound on unserviceable ALFR of roughly 0.06% and implies MTBF greater than 15,000,000 hours; using a 95% confidence “rule‑of‑three” yields a more conservative bound near 0.18%. No link flaps were observed over the first 1,000,000 device‑hours, indicating stability in early life where infant‑mortality defects typically manifest.
The reliability gap is economically meaningful at fleet scale. At an ALFR of 1.53% for 400G FR4 pluggables, a fleet of 100,000 ports would expect approximately 1,530 link incidents per year; at 0.34% for CPO the expected count falls to roughly 340, a reduction of about 1,190 incidents. At 1,000,000 ports the reduction is about 11,900 incidents. Given that each incident drives direct labor, inventory handling, and training‑job disruptions in AI fabrics, the opex and productivity benefits are likely nontrivial. The data also call out a separate category of “unserviceable failures” that trigger whole‑system swaps; CPO showed none in the sample window. Even allowing for the relatively limited 2,000,000 port‑hour control group and differences between data‑center and stressed‑lab baselines, the magnitude of the deltas argues that integration materially reduces observed failure modes.
The technical mechanisms for improved reliability are consistent with the failure modes seen in pluggables. CPO reduces the number of electrical and optical interfaces, minimizes handling and contamination at the faceplate, and allows thermal co‑design so lasers and photonic components operate in a narrower and better‑controlled envelope. Integration enables system‑level test and screening in manufacturing, reducing field escapes. Removal of LR SerDes and associated retimers simplifies equalization and reduces sensitivity to fiber impairments and connector wear that commonly drive intermittent “link flap” events. Fewer human touch points further reduce ESD, contamination, and mis‑mating incidents.
Performance and system design implications are favorable to CPO at 51.2T and are likely to become stronger at 102.4T. Eliminating LR SerDes and the pluggable DSP lowers electrical I/O power per bit and relieves switch die shoreline and package escape constraints, enabling higher faceplate density without a commensurate thermals penalty. Short‑reach or D2D SerDes also improves latency by removing DSP pipeline stages. LPO captures some of the power benefit by removing the DSP while retaining the pluggable; however, it does not address shoreline density, continues to rely on long electrical runs from the ASIC to the faceplate, and tends to operate with tighter link margins across fiber plant variability. As port rates migrate to 224G SerDes for 1.6T, the electrical reach and loss budgets become more hostile to pluggable architectures, improving the relative case for CPO.
Counterarguments and risks remain. Serviceability has historically favored pluggables; although zero unserviceable CPO failures were observed in this dataset, broader field data over longer horizons are still required, particularly in dusty high‑vibration environments and with different optics types beyond FR4. Vendor lock‑in and multi‑sourcing are unresolved concerns because CPO today is platform‑specific and less interchangeable than MSA pluggables. Manufacturing yield for large co‑packages, fiber attach, and laser reliability under continuous high‑temp operation are nontrivial. Network operators may also prefer the operational flexibility of pluggables for incremental upgrades and sparing. Finally, control‑group size, workload mix, and the use of “stressed lab” data for some baselines introduce comparability caveats; the directionality is clear, but exact improvement multiples should be treated as preliminary.
Investment implications are significant across the stack. Switch‑ASIC suppliers with credible CPO roadmaps and packaging partnerships stand to capture more value per port and deepen platform lock‑in. Broadcom is directly implicated by the Tomahawk 5 CPO data and would be a prime beneficiary if CPO ramps with 51.2T and 102.4T generations. Competitors that can offer equivalent co‑packaged platforms benefit as well, but the head start becomes a competitive moat given the packaging, test, and optics co‑design learning curves. Optical engine and laser suppliers positioned for co‑packaged attach should see sustained demand even as the value shifts from finished transceivers to co‑packaged engines; silicon photonics and EML laser vendors with proven high‑reliability arrays are positioned to win content. OSATs and advanced packaging ecosystems capable of high‑volume fiber attach and co‑packaged integration should also benefit.
Pluggable DSP and retimer suppliers face a structurally adverse mix shift as CPO (and to a lesser degree LPO) remove the PAM4 DSP from short‑reach intra‑DC links. The total pluggable DSP TAM persists in long‑reach, DCI, ZR/ZR+ and brownfield upgrades, but growth in the highest‑volume hyperscale intra‑DC ports would decelerate if CPO takes meaningful share in 800G and 1.6T generations. Module makers whose value proposition is assembly of pluggables will experience pressure if hyperscalers move spend to co‑packaged engines and system integration; those with credible CPO engine businesses can partially offset the headwind, but value capture migrates to the system vendor and the ASIC house. LPO‑focused silicon providers have an intermediate path: LPO may win near‑term ToR/server links where distances and operational practices permit, but its strategic runway narrows if CPO gains dominance in switch‑to‑switch fabrics and 224G electrical reach further compresses margins.
For hyperscalers and cloud operators, the decision calculus balances power, density, reliability, serviceability, and supply chain resilience. The reported 0.34% ALFR and 2,600,000‑hour MTBF, combined with zero unserviceable failures over 15,000,000 port‑hours, materially reduce incident rates and training job disruptions at AI‑fabric scale. Port density and power efficiency gains improve cluster design and TCO, especially as fabrics expand by 2x–3x over the next product cycles. The main gating factors are vendor diversification, manufacturing maturity, and internal operational playbooks for CPO sparing and RMA. If those are addressed, CPO becomes the default for high‑radix fabrics at 51.2T and especially at 102.4T, with LPO and pluggables remaining as complementary solutions for specific reaches and upgrade paths.
Base‑case portfolio posture should tilt positive toward suppliers with defensible CPO roadmaps and packaging capacity and neutral to negative toward pure‑play pluggable DSP/module exposures without credible CPO strategies. Upside risk is an accelerated CPO deployment cycle driven by AI fabric scale and power constraints; downside risk is a slower transition if field reliability or multi‑sourcing concerns delay broad adoption, allowing LPO and improved pluggables to extend their lifecycle at 800G and early 1.6T.