Facebook engineers have built and open-sourced an Open Compute Time Appliance, a new contribution to modern timing infrastructure, linked to GNSS as the authoritative time source. They call their invention the Time Card, a peripheral component interconnect express (PCIe) card that they say can turn almost any commodity server into a time appliance. With the help of the Open Compute Project (OCP) community, they have established the Open Compute Time Appliance Project and open-sourced every aspect of the Open Time Server.
In March 2020, Facebook began switching over the servers in its data centers to a new timekeeping service based on Network Time Protocol (NTP). The new service, built in-house and later open-sourced, was more scalable and improved the accuracy of timekeeping in the Facebook infrastructure from 10 milliseconds to 100 microseconds. More accurate time keeping enables more advanced infrastructure management across data centers, as well as faster performance of distributed databases.
The new NTP-based time architecture uses a Stratum 1, directly linked to an authoritative source of time, such as GNSS or a cesium clock. Facebook NTP service is designed in four layers, or strata:
• Stratum 0 is a layer of satellites with extremely precise atomic clocks from a GNSS, such as GPS, GLONASS, or Galileo.
• Stratum 1 is Facebook atomic clock synchronizing with a GNSS.
• Stratum 2 is a pool of NTP servers synchronizing to Stratum 1 devices. Leap-second smearing occurs at this stage.
• Stratum 3 is a tier of servers configured for a larger scale. They receive smeared time and are ignorant of leap seconds.
Many companies rely on public NTP pools such as time.facebook.com to act as their Stratum 1. However, this approach has its drawbacks. These pools add dependency on internet connectivity and can impact overall security and reliability of the system. If connectivity is lost or an external service is down, it can produce outages or drift in timing for the dependent system.
The new dedicated piece of hardware, called Time Appliance, consists of a GNSS receiver and a miniaturized atomic clock (MAC). Users of time appliances can keep accurate time, even in the event of GNSS connectivity loss. Such an open-source time source can open up broader applications for timing, according to the Facebook engineers. Their system is explained in a blog post, with further instructions in a separate document.
Engineers Ahmad Byagowi and Oleg Obleukhov used an onboard MAC, a multiband GNSS receiver, and a field-programmable gate array (FPGA) to implement the time engine. The time engine’s job is to interpolate in nanoseconds the granularity required between consecutive PPS signals. The GNSS receiver also provides a ToD in addition to a 1 PPS signal. In the event of the loss of GNSS reception, the time engine relies on the ongoing synchronization of the atomic clock based on an average ensemble of the consecutive PPS pulses.
The Time Card allows any x86 machine (most desktop and laptop computers, workstations and servers) with a network interface card (NIC) capable of hardware time-stamping to be turned into a time appliance. This system is agnostic to whether it runs for NTP, PTP, SyncE, or any other time synchronization protocol, since the accuracy and stability provided by the Time Card is sufficient for almost any system. According to the engineers, this enables a very precise and stable NTP Stratum 1 server.
They have also worked with several vendors who will be building and selling time cards. The architecture of Orolia’s Atomic Reference Time (ART) Card as well as the software architecture that will manage the card are intended to be embedded in any Open Compute server to build a PTP Grand Master. NVIDIA offers the precision timing-capable ConnectX-6 Dx and BlueField-2 DPU.
Byagowi and Obleukhov conclude their blog post by stating “The Time Appliance is an important step in the journey to improve the timing infrastructure for everyone, but there is more to be done. We will continue to work on other elements, including improving the precision and accuracy of the synchronization of our own servers, and we intend to continue sharing this work with the Open Compute community.”