In our recent webinar How CXL Technology will Revolutionize the Data Center (available on-demand), we received far more questions than we had time to address during the scheduled Q&A. What follows are answers to many of those questions providing greater context on the capabilities of Compute Express Link™ (CXL™) technology.
Click on a link below to jump to a specific question:
- Is there a maximum number of compute nodes/memory nodes allowed in the case of CXL memory pooling and switching?
- Today’s dense servers (including OCP servers) often have all the front and back panel ports populated. What are your thoughts on how to make more ports available on today’s servers for CXL connections?
- What are the release dates for CXL 1.0, 2.0, and 3.0?
- Does CXL 2.0 support link encryption?
- Can CXL support chiplet implementations?
- How much end-to-end latency in nanoseconds (ns) does a CXL link add? Is there a breakdown for each component/layer?
- Are there benefits to using optical technologies, for example co-packaged optics, to implement CXL Fabrics?
- How does CXL-attached memory compare to HBM as far as bandwidth and capacity are concerned?
1. Is there a maximum number of compute nodes/memory nodes allowed in the case of CXL memory pooling and switching?
While CXL does put upper bounds on the number of Type 1 and Type 2 end points in a pooled or switched architecture, the number of Type 3 end points in practice are dictated more by latency requirements and efficiency goals.
What we have seen is that scaling a specific pooling element beyond a certain number of ports carries with it a latency penalty that end customers do not want to pay. However, too few ports does not deliver the required efficiency gains. Our observation is the industry seems to be converging on pooled memory across anywhere from 4 to 16 compute nodes in at least first-generation implementations.
2. Today’s dense servers (including OCP servers) often have all the front and back panel ports populated. What are your thoughts on how to make more ports available on today’s servers for CXL connections?
Drive slots in the front of a server and add-in card slots in the back of a server are indeed a precious resource in today’s data center. Ultimately, workload needs drive how the resources in a server are used. Compute servers (approx. 2/3 of all servers in data centers worldwide by our estimation) are more likely to take advantage of CXL-attached memory and sacrifice some PCI Express® (PCIe®) lanes and front or back of server slots for memory expansion or for connectivity to pooled memory. Storage servers less so.
The introduction of CXL may well change server architectures, everything from new memory module form factors to new backplanes to new rack mount server or appliance form factors, in order to allow for CXL-attached memory. In fact we’ve already seen evidence in various standards bodies making attempts to introduce such new form factors and architectures. Also, with every new CPU generation, the number of PCIe (and now PCIe/CXL) lanes are increasing, providing more opportunities for attachment in either the front or back of server.
3. What are the release dates for CXL 1.0, 2.0, and 3.0?
CXL 1.1 was released in March 2020, this was the first release. CXL 2.0 was released in November 2020, and CXL 3.0 was released in August 2022. The CXL Consortiums allows for ECNs to add optional capabilities between major releases.
4. Does CXL 2.0 support link encryption?
Yes, CXL 2.0 includes Link-level Integrity and Data Encryption (CXL IDE) as an optional capability. Leveraged from PCIe IDE, CXL IDE provides for a secure connection relying on AES-GCM cryptography.
5. Can CXL support chiplet implementations?
A: Yes, CXL can universally be used for chiplet-to-chiplet (3D stacked or otherwise), package-to-package, or even system-to-system communication. The only limitation is that of physical reach due to the signal integrity characteristics of a given SerDes and the channels that are being driven. Both are implementation dependent. Longer reaches can be enabled through the use of retimers or active optical cables of course.
It should also be noted that CXL is supported as a protocol layer for Universal Chiplet Interconnect Express™ (UCIe™). The UCIe specification, announced in March 2022, is a new open standard for chiplet interconnect introduced by Intel®, AMD®, Arm®, and all the leading-edge foundries.
6. How much end-to-end latency in nanoseconds (ns) does a CXL link add? Is there a breakdown for each component/layer?
Latency adders are very implementation dependent. A package-to-package link delay between, say, a CPU and a CXL memory expander is often modeled as 4ns. However, different trace lengths, connector options, etc. will impact that. The latency introduced by the CXL logic in the CPU and the CXL memory expander are implementation dependent. The generally accepted target for total round trip unloaded read latency, including media access, is equivalency to one NUMA hop in a multi-socket compute architecture today (<100ns).
7. Are there benefits to using optical technologies, for example co-packaged optics, to implement CXL Fabrics?
Yes. Optical technology in general provides for much longer reach than copper.. Co-packaged optics is an implementation option for future CXL-based architectures and has the additional benefit of allowing for lower power SerDes since the electrical connection can be shorter vs. discrete optical modules. As co-packaged optics technology continues to evolve this will prove an interesting deployment option for the industry where applicable – most likely in rack-level cabled solutions that involve CXL pooling or switching.
8. How does CXL-attached memory compare to HBM as far as bandwidth and capacity are concerned?
CXL memory can be any kind of memory, depending on what type of media controller the CXL memory expander supports. However, HBM memory is much higher bandwidth than any compliant CXL port. After all, HBM achieves its bandwidth by leveraging a 1024-pin-wide bus. The bi-directional bandwidth of CXL is bounded by the data rate of the lanes used to form the CXL port. For example, an 8-wide CXL 2.0 port running on PCIe Gen 5 electricals at 32GT/s delivers 32GB/s in each direction. In practice, due to FLIT-packing, the effective bi-directional bandwidth is less than that. For simplicity, you can consider an 8-lane CXL 2.0 port to deliver bandwidth that is equivalent to a DDR5-5600 RDIMM. So, considerably less bandwidth than HBM, but at much greater pin efficiency. At the end of the day, the goal of CXL is to deliver more memory capacity and bandwidth to CPUs in the most pin-efficient manner possible with the best latency profile possible, so it is solving a very different problem than HBM. If maximum bandwidth is desired, regardless of pin count, then HBM is still the most effective way to achieve that goal.
If you have any questions regarding CXL technology or Rambus products, feel free to ask them here.
Leave a Reply