In the past couple of years I've had quite a bit of exposure to customers with large WANs in various industries (many non-IT centric) - with xDSL and DMVPN/FlexVPN playing a big role alongside simpler things like fibre based L3VPN and Internet access.
Depending on customer requirements, WAN solutions can range from simple to CCIE lab worthy (and I've seen both) - but what prompted writing the list below was a bit of interaction with one of the vendors of the so-called new wave of SD-WAN solutions. I'm going to stay away from SD-anything because they are often vague marketing loaded terms, but that doesn't stop me from asking: what would an improved WAN offering look like? By improved I mean that it solves some of the challenges we're facing today (technical, user experience, cost, deployment etc.) or makes a big impact on customer experience.
So here's what I came up with - this is all vendor agnostic mind you (no grumbling in this post)!
Per-application dynamic re-routing and prioritisation
Given multiple WAN circuits at a location, separate business critical appplications from regular traffic, perhaps over different links.
Tracking application metrics for performance / end-user experience has lots of room for improvement - for example react to abnormal levels of TCP retransmits and switch traffic to other circuit.
Traffic optimisation
Caching, compression, helping TCP in lossy conditions are a few examples that come to mind.
Load-balancing
Active use of all available links over mixed last-mile connectivity (such as mpls, wired Internet, wireless P2P private/Internet, mobile Internet or private APN, symmetric fibre, xDSL). This is a tricky one, especially when circuits are very different (think fibre vs mobile) in throughput/latency/jitter/loss metrics.
Deployment
A repeatable staging and deployment model (as close to zero-touch at the branch as possible). Ensuring that deployed devices run the correct OS version (vendors ship them from factory with all sorts of amazing OS images on them) and have the templated, tested and locked down config on them (easy troubleshooting, smooth transfer to ops).
Performance analysis and troubleshooting
The more smarts included from the points above the less direct visibility you have as an operator into how packets are moved around - it becomes critical to have the ability to trace paths, capture packets and view performance metrics with a good and usable UI (graphical and CLI).
A decent API for integration with other systems, when you actually have better tools than what the vendor is giving you.
And, as always, thanks for reading
Did I miss anything? Probably, so I'd love to hear your thoughts in the comments below.