Telemetry Working Group

A new working group is tackling observability in production.

Observability has become an increasingly hot topic given the challenges of reliably operating distributed systems such as one finds in Kubernetes environments. The term can cover a lot of ground but a typical definition spans metrics, tracing, and logging. Even if monitoring is often considered to be something distinct, it’s at least closely related. A key part of observability is the automatic collection and transmission of data. In other words, telemetry.

There is no shortage of open source projects in this space. However, the production-level testing and refinement of these tools—together with their associated procedures and datasets—in an integrated multi-tenant open environment has been much less common. That’s the problem that the new Telemetry Working Group (WG) is tackling.

A variety of other initiatives are related to the Telemetry WG. OpenInfra Labs (under the Open Infrastructure Foundation) is hosting the working group. Operate First (operate-first.cloud) will house the experiments and research associated with the group. Initially, it will focus on Kubernetes but may be extended to other high performance computing environments over time. The Mass Open Cloud (MOC) which sponsors and hosts a large portion of Operate First is also involved, as is the New England Research Cloud.

It’s a cross-research university, cross-company, and cross-open source project effort. This specific initiative was first kicked off by Boston University’s Michael Daitzman, although there had been other discussions and work going on in this general area for a while. It’s now co-chaired by Tuft University’s Raja Sambasivan and Marcel Hild, a manager of software engineering in Red Hat’s Office of the CTO.

The group’s goals are as follows:

  • Create Open Data sets for research
  • Provide access to a platform for telemetry research
  • Define and implement a standardized application stack - the “gold standard”
  • Define research problem statements around telemetry
  • Iterate over implementations of solutions on those problem statements

Another explicit goal is to not create new open source projects. As Hild puts it: “We have a large number of projects solving similar enough problems. The challenge these days lies in connecting these projects and operating these projects in a real environment.” He adds that “We don’t want to do everything in a lab; that’s a controlled environment. And controlled environments are only so good.”

A core premise of the working group from the beginning has been to operate in the public and to make any code open source over time even if it’s not at the very beginning, as will any data that does not include personally identifiable information. Anyone is welcome to participate. Meetings are recorded and can be accessed via the Telemetry Working Group Playlist on the MOC YouTube page. The group’s repository is on GitHub.