In a recent discussion with fellow network engineers about encryption in a DC network, I made an observation that in some cases it might be better to simply enforce end-to-end encryption directly between applications rather than in the underlying infrastructure (MACsec, IPSEC etc.).
Looking at MACsec for example, as crypto is done by the ASIC, the general opinion was that it must be faster than doing it on a server CPU. But having no real data or comparison of that, I decided to dig a bit deeper.
I started the search with the most likely implementation of application bound encryption, that is SSL (or TLS if you want to be picky). And what better protocol than HTTP to use such encryption, with so many studies of HTTP vs HTTPS performance out there?
There are two potential encryption related bottlenecks in such a session: the handshake (aka establishing the secure communication channel) and the encryption/decryption of the application data itself.
Most discussions and comparisons I've found (SO) are centered around the handshake:
- it adds 2 RTTs due to the extra exchanges that need to take place (only 1 RTT with False Start)
- many short-lived sessions will make this delay overshadow any other performance metrics
- hardware optimizations - like AVX2 instructions in Xeon Processors giving 26-255% boosts to key exchange performance
- other elements affect perceived performance hits: static vs dynamic content, caching behaviour
Which is all nice and well, but it is not very relevant to our question - in a long-lived session, is the overhead from encryption on a generalized CPU a problem?
Let's encrypt some stuff
This AES-NI SSL Performance study shows single threaded performance for CPUs with the AES-NI instruction set - and quite a few of them can push enough data for a 10Gbps interface by pooling raw output from a bunch of cores.
I did the same test on my laptop (i7-5600U CPU @ 2.60GHz): 1 core (out of 4 with HT) could push 99MBps (Bytes!) of AES-256-CBC encrypted data to my 1Gbps (125MBps) NIC.
λ ~ openssl speed aes-256-cbc
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256 cbc 91251.57k 97223.70k 98791.17k 98241.54k 99019.43k
If I were to use the numbers above, it would take 1.0099 seconds
for 100MBytes to go through openssl
encryption and 0.8 seconds
to be transmitted over the 1Gbps NIC (ignoring overhead, packet encapsulation etc.). So a single-threaded, single-client network application would be waiting after the encryption process.
The way I see it, unless it's a server under constant heavy load (with multiple 10Gbps interfaces and single-threaded elephant flows), the NIC should not have to wait for encrypted data and the introduced delay will be like adding another (high speed) hop to the RTT.
One last point is that while network based encryption requires crypto capacity for all the traffic passing through it (multiple servers at the same time), pushing some of it to the application level distributes the load to the edge (and server CPU performance is cheaper than specialized networking hardware when it comes to crypto).
This is probably the least scientific deduction I've made on this blog (won't become a habit I swear), so please let me know if I'm right, but especially if I'm horribly wrong, privately or in the comments below!
Other references:
- NGINX SSL Performance
- SSL Performance Myth
- Networking 101, Transport Layer Security (TLS)
- Is TLS Fast Yet?
- Overclocking SSL (according to this when Gmail switched to HTTPS by default in 2010 they needed no additional hardware and SSL/TLS accounted for "less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead")
- SSL handshake latency and HTTPS optimizations
And, as always, thanks for reading.