Globally Synchronized Time Via Datacenter Networks
Presenter: Ki Suh Lee
Wang, Vishal Shrivatsav, Hakim Weatherspoon
are important for network and distributed systems since precise clocks help you
with monitoring, coordination, updates, etc. Broadly, clock synchronization
protocols work measure the offset or the time difference between two clocks and
the precision or the maximum offset between any two clocks. The two servers
exchange messages between them that contain the timestamps of the servers.
Using this information and the round-trip time of the message, one is usually
able to compute the offset between the two servers.
the RTT is hard. There could be a skew, timestamping could be wrong or the
network could induce delays in the messages sent between the two servers. PTP
(Precision Time Protocol) tries to overcome this by hardware stamping on
PTP-enabled switches, yet it doesn't provide bounded precision and requires
PTP-enabled devices in the network. Approaches like NTP also suffer from much
higher precision errors.
The authors propose
DTP which uses the physical layer since clocks are already synchronized at that
level between peers. Further, this approach doesn't impact higher layers and
doesn't have network overhead either. This is able to achieve nanosecond level
precisions since the physical layer clock is updated every few nanoseconds,
resulting in high accuracies.
A DTP enabled device
has a local counter that is updated at every clock tick or is adjusted upon the
receipt of a message from a peer. The DTP protocol overwrites /E/ bocks in the
PHY layer with the time at the physical layer and sends these messages to its
peer. The message sending involves two phases namely an INIT and a BEACON. The
two peers send an INIT message that is sent back to the sender as an ACK and
using this, one can compute the one-way delay. The BEACON messages contains the
local counter value of the sender and the receiver uses this information, its
delay and its own local counter to update its clock. The error in this process
is utmost 4 clock tick errors (2 from the one-way delay message and 2 from the
To synchronize this
externally, a DTP Daemon could be used to poll these counter values and
correlate them with UTC values. This DTP approach needs NIC and switch
modifications like PTP, but gives better precision bounds than PTP.
To evaluate DTP, the
authors used an FPGA development board along with few other modules and setup
12 identical servers in a tree topology. They measured offset between peers in
this 3 level network. They also evaluated PTP on the same topology. The evaluations showed that while the PTP offset is about 10s to 100s of nanosecond when idle,
the offset is close to 100 microseconds when experiencing load. DTP offset, on
the other hand, is always within 4 clock ticks and within a few nanoseconds. Even
within a datacenter with 6 hops, the offset between peers is about 150
nanoseconds and end-to end application delay isn't more than 200 ns.
Q: Do you think the
ability to hack into the physical layer is generally useful for other reasons?
How easy was it to get into the physical layer?
A: It is not very
hard to insert something into the physical layer, once you understand it and
know what you are doing, you can modify the bits to do other things
Q: Does this
translate into the wireless domain at all?
A: Wireless domain
is a little bit different, there are collisions, wireless doesn't guarantee
that something is being sent continuously.
Q: What if it was
A: yes (no
Q: You make some
assumptions about the frequency. What if this was meant to run on a different
link with different frequency?
between clock frequency. If you control the increments with the frequency
carefully, it might still work.
Q: This needs to be
standardized to be used. Any plans?
A: Looking forward
to making this a standard for others to use.
Q: This mechanism
keeps it internally synced, but could drift from a global counter. What are
your thoughts on external synchronization?
A: Use a connected
FPGA board that is synchronized via GPS and this acts as a global clock, but
you could potentially synchronize all internal clocks to that Global clock.