When performance goes down, the stakes go up. Happy customers turn into angry, Hulk-like monsters. The clock is ticking and the pressure is on.
Over the last 10 years, we’ve honed our skills by working with some of the world’s largest service providers and their customers, monitoring the performance of over 50 million homes worldwide. With 27 tests and 65 metrics, we can show you what you need to do to resolve your issues, whether they’re Thanos-sized, or so small you’d need an Ant (Man) to identify them. Here are some real scenarios of when we saved the world.
A. End-to-end testing of real applications
A disgruntled Netflix streamer called us as a last resort. Let’s call him Tony.
Tony was miserable. Despite receiving his full 100Mbps, as advertised, Tony's Netflix was constantly buffering. Entire evenings’ worth of entertainment ruined and a streaming subscription gone to waste meant that Tony felt pretty fed up.
We took a quick look at Tony’s download speeds in SamKnows One, our cloud-based analytics system, (they were fine), and then compared them to his peers who were on the same product with the same ISP. Again, download speeds were doing well. Perhaps Tony’s poor performance was an anomaly?
Until we looked at his latency, which was sky high. In fact, Tony's Netflix traffic was going all the way to a server in the USA, rather than a much closer one in Frankfurt.
Netflix traffic was directed to Washington, rather than a closer server in Frankfurt.
What’s more, this problem extended far beyond Tony. In fact, Steve, James, Natasha, T'Challa, Clint, and Wanda were all experiencing the exact same issue. We contacted his ISP straight away, who were able to fix the problem with a simple IP address update. Tony returned to streaming his videos buffer-free, and his ISP avoided hundreds of support calls by fixing the problem before more people noticed.
You can learn more about our investigation here.
B. Supporting an important product launch
When an ISP (let's call it S.H.I.E.L.D.) launched their new on-demand video service, which was to be delivered by set-top boxes via multicast (with the home CPE acting as a multicast proxy) they were keen to ensure the release went without a hitch. So, in true Iron Man style, we invented a custom test that perfectly fit their needs. Lucky we did, because no sooner had the product launched, they hit a problem. Customers called up to complain that videos were frequently stalling and video pixelation was blurred.
Running the new custom test from 3,000 home routers, we saw that most metrics were performing well. A little more digging revealed an issue that was affecting a single model of CPE on S.H.I.E.L.D.'s network. The difference in performance was huge: jitter for the affected CPE measured ~35ms, whereas jitter for other CPEs consistently delivered <1ms.
With our help, S.H.I.E.L.D. traced a bug to the CPEs’ wireless interface driver that was generating a high number of interrupts periodically, stalling the multicast proxy running in the CPE, and generating high jitter. S.H.I.E.L.D.'s team were on it straight away - fixing the bug and restoring streaming to perfect order.
And along the way… we also noticed that Netflix traffic was crossing many more intermediary networks than necessary. It was looking like Netflix was heading for some serious congestion. We alerted S.H.I.E.L.D. and all was resolved within 24 hours.
C. Quickly understanding customer problems
Sometimes problems seem almost impossible to find. But an ISP knew something unusual was going on when it noticed angry complaints cropping up on its forums. Only a minority of people were reporting issues but they were very vocal. The potential loss of sales and damage to the ISP’s brand were serious, and as time went on, more customers were joining in and recording the same symptoms.
The ISP went straight to SamKnows One. Filtering the results to just a few of the individuals who were recording problems, they were able to spot anomalies in the data. Running a CDF plot, it was clear that something had caused a major step change in performance:
Splitting the results further, we could see that the problem only affected the devices attached to a certain model of CPE supplied to customers using Ethernet. Wireless customers were fine. By matching the time that the issue occurred to the CPE firmware log, we could see that a firmware update had started to roll out. As soon as we realised, the ISP stopped the update and rolled it back to solve the issue.
As the internet gets faster, download speeds are rapidly losing their ability to reflect customer experience. With an increasingly complex global internet infrastructure, the potential causes of problems are more numerous, less obvious, and harder to find than five years ago. By measuring all the different elements that affect your customers’ internet experience, you can see exactly how they interact with one another and find that “needle in a (giant) haystack” before your customers even notice. That’s why we encourage everyone to measure all the components that interact with one another to give you responsive, stable, and reliable internet performance. By doing this over time and under the same network conditions, you can lower your support costs and dramatically reduce resolution times.
To fight foes no single superhero can withstand (and for more information about how we can measure your end-to-end network performance) please contact our sales team.