Presenter: Monia Ghobadi
Co-Authors: Ratul Mahajan, Amar Phanishayee, Nikhil Devanur, Janardhan Kulkarni, Gireeja Ranade, Pierre-Alexandre Branche, Houman Rastegarfar, Madeleine Glick
Currently, datacenter networks have electrical fibers that connect TOR switches to others. These links, as a result, have static capacity and the only way to change the connection is to send someone physically to the field to change it. However, according to the authors, many rack switches are either under-utilized or over-utilized. Many of the racks don't exchange much traffic, but some of them generate a lot of traffic. Hence, there is a need for reconfigurable interconnects that can adjust capacity dynamically.
Desirable properties of such interconnects include a way of augmenting the standard capacity by maintaining separate static and reconfigurable portions. There is also a need for high fan-out or a huge number of direct links to other racks so that high traffic can be sent along these links. This should also involve low switching time to send the traffic fast. The authors argue that ProjecToR accounts for all of these.
The key insight in ProjecToR is to remove all the cables and use light instead. The medium of transmission is free space and the device used is a digital micro-mirror (DMD) to direct the light and a magnifier to adjust reach. By changing the bit pattern uploaded on the DMD, light can be redirected elsewhere. The number of accessible locations where the light is redirected to is proportional to the total number of micromirrors, but some accessible locations need to be skipped to eliminate interference resulting in about 18 K accessible locations or fan-out (all of these are within +- 3 degrees though). To address the last point or the narrow angular reach, ProjecToR uses angled mirrors to further the reach and then design the mirror assembly according to the datacenter requirements (like using a disco ball structure).
How feasible in a ProjecToR interconnect?
The authors build a small prototype and micro-benchmarked it. The prototype contained 3 ToR switches that used ProjecToR with a source laser to send light, FPGA to program the DMD and mirrors to reflect light towards receiving TORs. The authors ran evaluations on this for over a day in 10 second intervals. The throughput of this setup matches that of a wired link. The switching time, measured by changing destination to ToR 3 and measuring light intensity at both places to know when switching is completed, was about 12 microseconds.
The topology can be configured using connections between lasers and photodetectors. But the switching time is still larger than packet transmission time. So, the system uses a dedicated default topology and then uses opportunistic topology on the fly. The dedicated topology uses K-shortest paths routing and a virtual queue and is used for smaller flows. An opportunistic link can be created on the fly and is used instantly for elephant flows. Looking for active links for the opportunistic topology is similar to current switch scheduling problems. The system uses the Glae-Shapely algorithm and is very close to an optimal offline scheduling algorithm.
Cost is estimated based on cost breakdown of the individual components.
Simulations were run on 128 Tor with 16 lasers and photodetector and day long traffic. ProjecToR average flow completion time increases very slowly because of re-configurability even on a skewed traffic matrix. Firefly and Fat Tree are upto 95 % worse, the former because of low switching time and low fanout (multi-hop needed) while Fat Tree has no re-configurability.
Q: Free-space optics are very vulnerable to vibrations and outdoor variations which is an issue in datacenters. Have you had any experience with this?
A: Firefly has shown that it can be tolerable within 5mm, we use an optical bench that is more tolerant. If you miss something due to a photodetector, you might able to redirect it and capture it.
Q: Have you discussed this with computational geometry people? They might be able to provide some insights.
A: I have done simulations with typical data center topologies and came up with the disco ball, but haven't looked into reconfiguring the data center itself
Q: Two practical points from optical communication perspective. Cross talk is one and link-recovery in the optical receiver is the other (transceiver also has a link detection phase and would be available only after milliseconds).
A: We eliminate adjacent angles to remove crosstalk (by skipping every 4 degrees).
Prior work has showed that you might be able to overcome the latency in link-detection to nanoseconds.
Q: I am trying to compare free-space space to switch-spectrum space (you could get single hop/reconfigurability there too). Not sure why free-space is more desirable.
A: One reason is scalability since building a large switch in closed form is complicated.