Thursday, November 21, 2013

HotNets '13: Network Support for Resource Disaggregation in Next-Generation Data Centers.

Authors: Sangjin Han (U.C.Berkeley), Norbert Egi (Huawei Corp.), Aurojit Panda, Sylvia Ratnasamy (U.C.Berkeley), Guangyu Shi (Huawei Corp.), Scott Shenker (U.C.Berkeley and ICSI).

Traditionally data centers have segregated their computing resources into a collection of individual servers. There has been an ongoing trend with systems such as HP MoonShot and AMD SeaMicro, both of which disaggregate some resources. This paper looks at what data centers would look like in the future if this trend continues.

As this trend continues all resources would be accessible as standalone blades connected by a unified interconnect. This development would increase resource modularity dramatically. This would allow operators to update or replace hardware in tune with its upgrade cycle with less difficulty. It would also allow operators to expand capabilities in a more granular way. In this scenario, operators could purchase only the resource they need in aggregate.

While this is a significant conceptual change, hardware and software changes can be done incrementally. In the case of hardware the fundamentals would not need to be changed. The chief accommodation would be the addition of a network controller. For individual software applications, no changes would need to be made, but some minor changes would have to be made in the VM.

These modified virtual machines would then be able to achieve higher efficiency than in the traditional data center because of the increase in resource acquisition flexibility and fewer resources would be left unused on individual servers.

On a larger scale, instead of being connected on an internal bus, resources would be connected to a unified network. While a unified interconnect may seem radically different than a traditional internal interconnect, comparing PCIe to Ethernet shows them to be similar requirements.

A significant difference though is communication latency, which is particularly a factor with memory. By expanding the memory hierarchy to add a layer of local memory next to the CPU to act as a cache the cost of this could be mitigated. In an experiment they found that 10-40 Gbps network link is sufficient, with an average link utilization of between 1-5 Gbps. Latency of less than 10 microseconds kept overhead to less than 20%. Keeping latency low is a key ingredient to a performant disaggregated data center.

Q: This reminds me a lot of multiprocessor systems from the past used in mainframes.
A: They are similar in that they try to act as a big computer, but the main difference is how tightly coupled the resources are. A primary goal of a disaggregated data center is a decoupling of these resources. Previous systems coupled resources tightly to achieve higher performance.

Q: Instead of a proprietary bus you want to use a more open protocol?
A: Yes

Q: 20 years ago desk area network and some others had the idea that you can push the network very far in the device and very low latency interconnects was necessary. Are their common models or lessons?
A: Data centers are very big now and can achieve high economies of scale. This changes the economics of this approach, which was a problem with the earlier approach.

Q: What is the relationship between disaggregation and high performance computing. It seems like approach starts with commodity components. What if you started from supercomputing?
A: Modularity is everything. Vendor-lock-in increases costs.

Q: What is the overhead in hardware cost?
A: Network controllers at scale are very cheap.