Navigating TLS/SSL Inspection with Check Point: Insights and Challenges
The write-up that took my career to new heights
The post is from July 2019 (Versions r77.30 and r80.30). Since then, Check Point has issued hundreds of upgrades to their products, dramatically improving their capabilities.
My Check Point days are in the past, at least for the moment.
But…
This write-up and research hold a special place in my heart. It played a pivotal role in my career:
It was featured in Tim Hall’s book, “Check Point Firewall Performance Optimization - Third Edition”
It allowed me to present at my first international conference on one of the main stages: Check Point CPX 2020 - New Orleans
And so on.
Keep reading below.
Hello everyone,
I'm looking to share my experience and concerns regarding the current state of SSL Inspection in general, and with Check Point in particular. I'm also interested in hearing your approaches to this matter.
Important: This post focuses strictly on outbound SSL Inspection; my experience with inbound is limited to Check Point.
I'll start with a vendor-agnostic statement: One of the things that bothers me most about this subject is that many people claim to have this technology implemented in their organizations without any issues. However, upon meeting those organizations, I found one of the following:
They have at most 10 users.
Their SSL Inspection technology is shady: SSL Inspection is enabled, but there is a working and HUGE fail-open at the bottom of the engine. If something may cause issues, the connection is accepted, the log shows that the packet was "inspected," and everyone is happy.
They think they have SSL Inspection but don’t actually have it enabled; instead, they have a feature similar to categorizing HTTPS sites.
My Personal Experience
We all know the importance of inspecting HTTPS traffic in our network for visibility and security. However, at times, I feel like this way of thinking is my InfoSec persona speaking—one that only gives "woo woo" advice to the business regarding security and doesn’t even know how to install antivirus software. Why? Because properly implementing this solution is just painful. I also know that most customers don't want to dive into this matter and prefer to rely on their endpoint solutions for this layer of security. Even if it's not the same, I respect that.
Among our customers, we have one that is particularly tech-savvy with 900+ users. They like to enable all features in their NGFW, and one of the requirements was to enable full outgoing SSL inspection in the Check Point firewall. We started with this customer years ago with R77.30 in gateways, so you can imagine that I've been through all kinds of fun experiences with SSL Inspection.
We have fully tested SSL Inspection in the following versions: R77.30, R80.10, and R80.30.
During this journey, we went through a lot of information. There are many awesome posts here in Check Mates, SKs, SRs with the TAC—you name it.
Issues We Faced
Here is a summarized list of issues we encountered:
Heavy performance issues in R77.30 (fixed in R80.10+)
Many pages fail to load or load intermittently.
Some pages work, but specific sections (like login pages) fail.
Sophos Antivirus solution doesn't work for installation or updates: There are some posts about this issue.
AWS Connectors failing (solved in R80.30).
The Issue with Check Point and Outbound SSL Inspection
I really like Check Point firewalls, but sometimes I feel that they take control away from the administrator. One of the first things we did was disable all options related to dropping connections that don't follow the RFC line by line, and allowing connections to non-trusted certificates. We tried everything, yet we still faced many issues.
Bypassing the connection? Good luck with that. Check Point firewalls always inspect the first packet, and because of this, many connections fail.
Probe bypass to mitigate the previous issue? Sure, but be prepared to face other issues due to SNI verification.
Fail-open in probe bypass? This was a huge surprise after it was changed in Take 189, but we still had a lot of issues even after enabling this flag.
WSTLSD debug? Many times, but good luck not taking down the firewall with it and be prepared to wait a long time for the TAC to inspect it. It's not their fault; it's just really hard to troubleshoot these issues.
The only way we found to properly bypass connections was to exclude them COMPLETELY from the SSL policy. For example, let's say you have two network segments and you only want to inspect traffic in one of them:
What Most People Do
In this example, all traffic from 10.0.0.0 will be inspected. However, you will probably have some issues in the 192.168.0.0 network as well since the bypass action enforces inspecting the first packet of the SSL handshake.
The Only Way to Do Nothing with the Connections
In our research, we found that the only way to properly bypass a connection was to exclude it completely from the policy. Obviously, this approach is not scalable and somewhat utopian in a big network.
The Most Stable Scenario We Reached
We reached a state of stability in R80.10 with the JHF prior to 189 by enabling the following flags and features:
appi_urlf_ssl_cn_allow_not_rfc_ssl_protocols=1 (Don’t know where I got this; also, there is no documentation about it)
enhanced_ssl_inspection=1 (Probe bypass)
bypass_on_enhanced_ssl_inspection=1 (Fail-open probe bypass)
Almost all features that drop packets turned off in SSL Inspection.
HTTPS categorization turned on.
Sophos antivirus worked fine, and issues with web pages were minimized.
The Journey to R80.30
We decided to migrate one of the cluster members to R80.30 to test the new SSL Inspection engine and solve some issues we had with UserCheck. After deployment, we had issues with Proxy ARP:
Some inspection settings started to cause issues in R80.30 that didn't previously.
We sorted them all, and the first impression was great:
AWS Connectors worked flawlessly without enabling any of the previously stated kernel flags.
Sophos Antivirus could get updated without enabling any of the previously stated kernel flags.
All the services detailed in the preliminary testing document worked great.
The next day, a waterfall of user complaints started to appear:
Sophos Antivirus could not be installed: Updates worked fine, but installation failed. After looking into the logs, no traffic was dropped (logs and output from fw ctl zdebug). The only log was a Detect regarding untrusted certificates, which we configured to accept in the SSL settings. We tried the flags, setting up an FQDN object just for *.sophos.com in the SSL Inspection policy, and it still failed.
The main billing service of the company stopped working. Again, no logs or possible leads as to why. We even looked at PCAPs, and everything seemed fine from the firewall’s perspective. As soon as we routed this traffic through pfSense, everything worked flawlessly.
Another invoice service stopped working: Again, no leads whatsoever. After we routed this traffic through pfSense, everything started to work.
Web pages that did not load properly or had some functions affected.
We tried everything: performed captures by turning off SecureXL, but even the bypass flags could not solve these issues in R80.30. We had these issues in R80.10, and after turning on the different flags, everything worked, but not in this new version.
At this point, there were many issues impacting production. We blocked one hole, and another 10 appeared. It was just impossible to properly troubleshoot each issue, so we had no other option but to revert to our most stable version.
Future Plans
There is no way to deploy SSL Inspection without issues; the problem is that these issues will probably heavily affect your production environment. There is no way you can test all your organization’s use cases, and there is no way to properly assure functionality in a lab environment.
Our main concern now is the remote possibility that we will have to stick with R80.10 for life. We know we will have to update sooner or later, which is why we are now implementing a parallel CHKP Frontier similar to our failback pfSense. This new gateway will have R80.30 with the same features. Think of it as a hybrid testing/production environment.
The main idea is to route certain subnets to the R80.30 gateway and study their behavior and troubleshoot without all the user complaints.
Concerns Regarding the Current State of SSL Inspection
Check Point firewalls don't provide a proper solution to bypass desired SSL traffic, making it hard to deploy this solution in a large environment.
Lab tests are not representative; the only way to test is in production.
Troubleshooting is really difficult: Many times there are no leads, and everything seems fine on the firewall, forcing you to perform PCAPs on different parts of the network and debugging.
Check Point’s current approach to SSL Inspection impacts the brand’s image: I hear it all the time—“I have a friend who inspects SSL traffic with YYY and has no issues.” Most people don’t know that it’s more of a technology issue in general regarding SSL/TLS rather than a Check Point fault. However, other vendors offer this functionality with failback mechanisms that work without the user knowing. It’s less secure, but at the end of the day, the main metric is functionality and not security in most cases.
Hope this post helps you implement this feature in a harmless way.
Link to the original post: Check Mates - Outbound SSL Inspection: A War Story