Comments on NTIA’s proposed Performance Measures for BEAD Last-Mile Networks
Comments on NTIA’s proposed “Performance Measures for BEAD Last-Mile Networks”
From: Karl Auerbach
CEO/CTO InterWorking Labs
Scotts Valley, California
karl@iwl.com (email)
https://iwl.com/ (corporate website)
https://cavebear.com/ (personal website)
Date: November 13, 2024
Permanent URL to these comments: https://www.cavebear.com/cavebear-blog/ntia-bead-dec2024/
Who Am I
I am Karl Auerbach, I am the CEO and CTO of InterWorking Labs (IWL). My CV may be viewed on my website at https://www.cavebear.com/about/karl-cv-long/
I have been involved with the development and deployment of the Internet since 1972.
Our business at IWL ( https://iwl.com/ ) is to create network emulation products that help network developers bench test their code against less-than-perfect network conditions to find problems before their own customers do.
As such, we (and I) deal daily with issues of network imperfections, including latency (including jitter/packet delay variation), bandwidth, packet loss, duplication, out of order packet delivery, and so on.
Overview Of Comments
I appreciate and support the thrust of the BEAD performance measures effort.
However, I find the proposals ambiguous, simplistic, and insufficient. Those inadequacies can be remedied.
My remarks are in four major parts. The first part deals with bandwidth and bandwidth measuring. The second part is similar, but on the topic of latency. The third part is about aspects of network performance that ought to be part of the BEAD Performance Measurement framework. The fourth part is a catch-all of other suggestions.
One point that is made multiple times below is the need for clarity with regard to the point of demarcation between a customer and network provider.
Part 1: Bandwidth – Network “Speed”
Bandwidth (or network “speed”) is generally expressed as a simple bits-per-second measure. And it is the primary attribute of network performance that is of interest to those who are choosing among competing network service offerings.
Yet, bandwidth is not expressible as a single number, or even a pair of single numbers, one for downstream (towards the user/consumer) or upstream (away from the user/consumer.)
Which Bits Are Counted?
Bandwidth, basically, is a count of the number of user-interesting bits that are carried (or can be carried) over a period of time. But what bits are “user-interesting” and what, really, does “period of time” mean?
Internet packets are nested objects, like Russian nesting dolls. By this I mean that user data is wrapped in a sequence of TCP segments or UDP packets. Those are, in turn, wrapped in IPv4 or IPv6 packets. And those are wrapped in things like Ethernet frames, Ethernet/IEEE 802.1Q VLANs, tunnels, or other media-specific forms.
Down at the media level there is often-forgotten data such as media pre-ambles, CRCs, post-ambles, inter-frame gaps, etc.
The NTIA proposal does not clarify which of these bits are counted as part of the bandwidth computation and which are excluded.
The difference can be significant, particularly when each IP packet contains only a small amount of user-interesting data, as is common with packets containing voice or gaming data.
I wrote a piece on this issue:
A Deep Dive into Bit Counting - https://www.iwl.com/blog/counting-bits
I propose that the NTIA proposals define a new term, let me suggest “effective user data”, to represent the data that is of direct interest to users. This would be the actual data that is sent and received by the applications that the user is using. For a voice conversation, this would be the actual voice data and control information directly pertaining to the compression and rendering of that voice. Similarly, for video this would be the video frames and control information.
I would go a step further and suggest that this “effective user data” also include IPv4/IPv6, UDP, and TCP headers. I would go even further and include common wrappers or connection setup exchanges, such as TLS and, especially, DNS (and its DNS-over-HTTP and DNS-over-TLS forms.)
There are other kinds of traffic that could be included as “effective user data”, such as ARP packets, NTP/time exchanges, DHCP (for both IPv4 and IPv6), and IPv6’s discovery exchanges. The data component of these is usually small and could be dismissed as de-minimus.
A tool that measures network bandwidth ought to have knobs and levers to control which things are counted and which are disregarded.
What Is The Time Period?
Any formula for computing bandwidth must involve a time element.
If the time is too short than one gets a bimodal answer. Ether the network link is carrying zero bits or is at 100% capacity. Slightly longer time periods will tend to be distorted by the fact that many user data exchanges (such as an HTTP/S web page fetch by a browser) will have a considerable burst of start-up activity that burns a lot of bandwidth bits before data of interest to the user begins to flow. In addition, protocols such as TCP have “slow start” algorithms reduce data flows as the two ends of the connection explore the capacity of the underlying network path.
So the NTIA proposal must deal with the time span over which bandwidth is measured. Is there a time span appropriate for all possible usages?
I am not sure that there is. Some empirical experimentation could be useful to arrive at a time value that give generally acceptable, even if perhaps somewhat imprecise, results.
Don’t Discount UDP
Although much Internet traffic moves on TCP, there is a considerable (and perhaps increasing) use of UDP.
UDP has traditionally been used for small exchanges that do not need connection setup. These are things like DNS. (It should be noted that DNS traffic often makes up a surprisingly significant portion of the traffic carried over the last-mile, and DNS delay has a tendency to multiply and create a user sense of a sluggish network even if the actual network is quite fast and well provisioned.)
There are newer protocols, such as QUIC, that are TCP-like but operate on UDP packets rather than directly on IPv4 or IPv6. These protocols are likely to form an increasing portion of network traffic in the future.
Channel Contention Resolution: Satellite and Shared Radio
Some types of media have contention times in which a device may need to engage in some sort of bidding mechanism in order to obtain a slot in which data may be transmitted.
This was first explored with the Aloha networks and then with CSMA on coax-based Ethernet.
Contention allocation of bandwidth still exists in many radio based media ranging from cellular data to satellite access. Sometimes the contention resolution time can be short – milliseconds of less – or it can be long (particularly with satellites, and especially geo-synchronous satellites.)
This contention time, if long, will affect bandwidth both in terms of calculating an number an in user perception.
How Is Bandwidth Measured?
The general method of computing network bandwidth is to stuff a lot of traffic onto a link and measure how much data comes out the other end over a given period of time.
Two tools are commonly used for this: “ping” and “iperf” (the latter is actually a family of similar tools.) Both have problems.
Ping uses ICMP ECHO request and reply packets. Many providers and many intermediate devices on a network path may rate limit or even block ICMP ECHO packets. And the device that is responding may not be giving high priority to quickly making a reply. These will distort any measurements that are made.
Iperf tends to report only counts of the bits carried in the data portion of UDP packets and TCP flows. This non-counting of the layers of wrapper bits can significantly change the results of a calculation of network bandwidth.
I’ve written some notes on these topics:
Does IPERF Tell White Lies? - https://www.iwl.com/idocs/does-iperf-tell-white-lies
Part 2: Latency
Latency is the time it takes for a packet to get from hither to yon. But there are a lot of details.
Packets are emitted from a source, travel across a path strewn with snares that can delay, duplicate, drop, change, or re-sequence the order of packets.
Simple latency testing largely ignores most of those snares.
A perfect network would deliver packets A, B, C, D … in exactly that order and with exactly the same interpacket timing. But that is often not what happens in reality.
One of the worst of those snares is jitter, more recently called packet delay variation. Many applications, particularly conversational voice and video, and gaming are quite sensitive to both raw latency and to jitter/packet delay variation.
In addition, packet delay variation can result in effects such as “dam bursting” in which a sequence of packets that originally were nicely spaced are delivered with very tight spacing, creating what amounts to a water hammer impact on the receiver raising the possibility of buffer overruns and packet loss at the receiver.
Another of those snares, one that tends to occur when there are parallel elements (“bonded links”) along the packet path is re-sequencing, or out-of-order delivery. Many applications do not handle this well, or worse.
The NTIA proposal does not measure most of these effects despite these potentially having a substantial impact on the usability of a network link for a given purpose.
What Are The Points Between Which Latency Is Being Measured?
When evaluating network packet latency the first question is between what two points is that latency being measured?
That may seem a simple-minded question, but it is not.
For example, due to bufferbloat outgoing packets may be queued for several seconds on an outgoing Wi-Fi interface. Does the latency measure include or exclude this?
Similarly, the NTIA proposal seems to be proposing latency (and bandwidth) measurements using shared servers that may be rather distant from the path taken by usual traffic to/from a customer, and that these servers may be shared and subject to response variations due to variations in load or unrelated traffic near those servers.
For proper measurements, such “bounce traffic off of me” servers ought to be as close to possible (both topologically and physically) to the provider end of the customer’s “last mile”. These servers ought to be highly over-provisioned to minimize the impact of multiple simultaneous use and they should not be bearing other, unrelated, burdens that could affect measurements.
As an aside: As a hook for potential future methods of measurement, these servers should be time synchronized using low stratum time pulses, such as directly from a GPS feed (best) or (not best) from a very close stratum 1 NTP time server.
A Formal Point of Customer-Provider Demarcation
The NTIA proposal ought to clarify exactly what are the end points between which latency is being measured.
And the NTIA should go a step further:
The NTIA proposal ought to define a physical demarcation point between a customer and a provider.
This is something that was done well for telephone networks. We have an opportunity to do even better.
Some of our research at IWL has gone into the questions of what would such a demarcation point look like; could it be active, containing testing agents (that could provide customer-view diagnostics to help pre-catch developing problems)?
It is my own and IWL’s view that such demarcation point ought to be defined. This will be more easily done for customers that attach via a physical wire or cable than those who attach via less physically discernible medium such as shared radio (such as a Wi-Fi or 5G system.)
Such a demarcation point should have physical form, it should contain active software to facilitate ongoing monitoring and testing from the customer perspective, even when the customer access link is degraded or has failed. It is IWL’s view that the entire area of network monitoring, diagnosis, and repair has been seriously neglected. I have written on this topic:
Round Trip Versus One Way
Today’s Internet is filled with path elements that are asymmetrical. Not only do some kinds of media have inherently different bandwidths in either direction, but there is also physical routing asymmetry – packets flowing from a client to a server may take a very different path than packets flowing from a server to a client. And these asymmetries may change without notice.
Most latency measuring tools measure round-trip. There are tools that attempt to distinguish between the hither-to-yon latency and the yon-to-hither latency.
Simple “ping” is not one of those tools.
NTIA ought to endeavour to obtain and use latency measures that distinguish between traffic flowing towards the customer and traffic flowing away from the customer.
Part 3: Other Things Affecting Performance That Ought To Be Measured
-
NTIA ought to better distinguish between IPv4 and IPv6 based measurements. Not only is there a difference in bit-count overhead on packets, but there are also possibly other factors, such as different routes between IPv4 and IPv6. IPv4 Network Address Translators (NATs) can suffer from things like port number exhaustion that can trigger connection failure even when the underlying network is running well. IPv6 doesn’t require NATs and generally won’t suffer from this kind of partial (and quite frustrating to the customer) service blockage.
-
DNS response time is often a critical factor to the users perception of network quality. (This is especially so given that modern web page fetches often can instigate a hundred or more DNS lookups.). DNS response time components to bandwidth and latency measurements ought to be reported in a way that allows the customer to view underlying link quality both with and without those DNS components.)
-
Some types of media can have long channel access delays, particularly radio and satellite links. As with DNS response times, measures of bandwidth and latency ought to isolate the channel access time components and report those separately.
-
The usability of a last mile network path depends on more than mere latency and bandwidth.
NTIA ought to aspire to measure and report other important characteristics, such as the occurrence and shape of bursts of packet-loss, out-of-order delivery, duplication, and variations in latency and bandwidth. For example, a raw count of packet loss over a measurement period may be interesting, but rather more interesting is would be a statistical statement of the rate of occurrence of packet loss, the shape of burst losses (for instance uniform versus a Gaussian curve with a mean and standard deviation, etc.)
Similarly, quality of a last mile may vary with time of day and time of week. Someone who is depending on continuous levels of network service for medical monitoring may be rather interested in whether network quality can be depended upon 24x7x365 or whether it is subject to time of day or other variations.
-
It can be surprising how much “other” traffic is crossing a customer’s network link. Often unrelated broadcasts (such as ARP requests) can add a fair amount to useless traffic to a customer link. This traffic is often nearly invisible to users, but it can burn bandwidth otherwise of use to the customer and increase latency.
NTIA’s tools should measure this useless (to the customer) noise traffic and report it to the customer and provider.
-
Bonded paths – where multiple parallel wires (or equivalent) are bonded together to appear as a single connection – are not uncommon. It is an open question how, and if, the pieces of these bonded connections should be measured separately or as an aggregate.
-
Wi-Fi introduces uncertainties. Wi-Fi is not only a shared medium (and increasingly a mesh medium), but there are some subtle ways it can stutter. For example, Wifi uses multiple radio bands, such as some in the 2.4 and 5 gigahertz ranges (and above). Bluetooth uses the same 2.4 frequencies. Many user devices have a single radio that can end up rapidly bouncing between WiFi service and Bluetooth service, often causing packet loss and delay.
-
The NTIA proposal should identify and consider low level effects that are not always easy to perceive. These include media access delays, smart buffering or protocol offloading onto Network Interface Controllers (NICs), Maximum Transmission Units (MTU), traffic and noise bursts, intermediate active queue management, and deeply hidden traffic such as Ethernet Pause frames.
NTIA’s proposals ought to include these effects when the WiFi (or other radio based medium) is part of the well demarcated path being measured. NTIA ought to publish a note warning users of the potential issues (and some relatively easy cures) when WiFi is in use on the customer side of the demarcation.
Part 4: Other Recommendations
-
Customers ought to have easy access to several months of measurement data (including indications of link degradation or failure), so that customers (or experts acting on their behalf) can evaluate the quality of competing offerings for the customer’s own use patterns. This data ought to be available to both existing customers and potential customers and not subject to Non Disclosure or other anti-competitive legal constraints.
-
NTIA ought to sponsor development of a suite of test tool definitions (and perhaps reference implementations) that can provide customers with a synoptic view of last-mile quality. This suite ought to allow the user to designate different kinds of usage patterns – medical device versus web browsing vs streaming video vs video/voice conferencing vs gaming – so that the customer can get a better sense how a given offering satisfies that customer’s particular needs.
This suite ought to be open ended so that it may grow over time to measure other aspects of network quality-for-a-purpose than the existing tools.
-
I reiterate – NTIA ought to work to define a clear point of demarcation between network customer and network provider. That demarcation point should have physical form and be able to act as an active component that measures and diagnoses the network access path from the customer’s perspective. (This demarcation point may vary in physical form from one media type to another.) The cost of this demarcation point, and its maintenance, should be borne by the network provider.