How to approach fixing the TLS trust model

TLS is an exciting protocol and its wide deployment makes it even more interesting to work on. It has been said many times that the success of online commerce is due to the success of SSL/TLS and the fact that people felt safe in submitting their credit card information over the Internet. These days a lot of people have been speaking openly about how broken the TLS trust model is because of its reliance on Internet PKI and the Certificate Authorities infrastructure and there is some truth to that. We have seen two cases already where CAs trusted by all browsers have issued fraudulent certificates for high profile sites. Those incidents revealed two key problems in the existing TLS infrastructure today:

Any CA can issue a certificate for any web site on the Internet, which I call “certificate binding” problem
Revocation checking as deployed is ineffective

To counteract these deficiencies, multiple proposals have either existed or emerged. The most notable two are using DNSSEC for certificate binding and Convergence, which is based on independent system of notaries. While both have merits, I believe both have sufficient problems that will prevent them from being deployed widely, which is required for any successful change.

DNSSEC

Using DNSSEC for storing information about the server cert is actually very appealing. The admin is in control of which cert is deployed and controls the DNS zone for the site, so it makes sense to be able to use the DNS zone information to control trust. There is even a working group inside IETF to work on such proposal. The problems with DNSSEC are multiple though, here I list just a few:

It is not as widely deployed yet, but that is being fixed as we speak
Client stub resolvers don’t actually verify the DNSSEC signature, rather rely on the DNS server to do the verification. This opens the client to attack, if a man-in-the-middle is between the client and its DNS server.
There is non-zero amount of corporate networks, which do not allow DNS resolution of public Internet addresses. In such environments, the clients rely on the proxy to do the public Internet DNS resolution. This will break DNSSEC based approach, as clients don’t have access to the DNS records.
DNSSEC is yet another trust hierarchy, which is not much different than the current PKI on the web, just a different instance.

Convergence
Moxie Marlinspike has the right idea about trust agility and his proposal, which he calls Convergence, has a very good foundation. Where I believe it falls short is the fact that many corporate networks block outgoing traffic to a big portion of the Internet. Unless all notaries are white-listed for communication, traffic to those will be blocked, which will prevent Convergence from working properly. Also, the local caching creates a problem with timely revocation - if a certificate is found to be compromised, then until the cache expires, it will still be treated as a valid one.

My take

I actually don’t want to introduce any new methods of doing certificate validation. My goal is to point out a solution pattern that can be used to make any scheme actually deployable and satisfying most (if not all) cases. There are few basic properties any scheme should have:

All information needed for doing trust verification should be available if connectivity to the server is available
Certificate should be bound to the site, such that there is 1-to-1 mapping between the cert and the site.
A fresh proof of validity must be supplied

There is already existing and deployed, although rather rarely, part of TLS called OCSP stapling. It does something very simple - the TLS server performs the OCSP request, receives the response, and then supplies that response as part of the TLS handshake, the last part being the most crucial. The inclusion of the OCSP response as a TLS message removes all of the network problems that the currently proposed solutions face. As long as the client can get a TLS connection to the server, trust validation data can be retrieved. This brings property 1 to the table. In addition, OCSP responses are short lived, which satisfies property 3 as well. So the only missing piece is the 1-to-1 property.

So, there are two ways the problem can be approached - either bring certificate binding to OCSP somehow, or use any other method to provide certificate binding. The latter can actually be achieved rather easily with minimal changes to clients and servers. RFC 4366, Section 3.6 describes the Certificate Status Request extension, which is the underlying protocol messaging of how OCSP stapling is implemented. The definition of the request message is:

      struct {
          CertificateStatusType status_type;
          select (status_type) {
              case ocsp: OCSPStatusRequest;
          } request;
      } CertificateStatusRequest;

      enum { ocsp(1), (255) } CertificateStatusType;

The structure is extensible allowing for any other type of certificate status to be requested, as long as it is defined. I can easily see this message defining DNSSEC and Convergence as values of CertificateStatusType, then define the appropriate format of the request sent by the client. Conveniently, the response from the server is also very much extensible:

      struct {
          CertificateStatusType status_type;
          select (status_type) {
              case ocsp: OCSPResponse;
          } response;
      } CertificateStatus;

      opaque OCSPResponse<1..2^24-1>;

Currently, the only value defined is for OCSP response, which is treated as opaque value as far as TLS is concerned. Nothing prevents whatever information the above proposals return from being transmitted as opaque data to the client.

Just like Moxie explored in his presentation, using the server to do the work of retrieving trust verification data preserves the privacy of the client. It does put some extra burden on the server to have proper connectivity, but that is much more manageable and totally under the control of the administrator.

It is true that there will have to be change implemented by both clients and servers which that will take time. I fully acknowledge that fact. I do believe though, that using the Certificate Status Request is the most logical piece of infrastructure to use to avoid all possible network related problems and provide an inline, fresh, binding trust verification data from the server to the client.

One thing I have not yet answered to myself is - how do we make any new model to fail safe. Having hard failure and denying the user access has been problem forever, but if we keep on failing unsafe, we will continue chasing the same problem into the future.

So in conclusion, for any solution to fixing the TLS trust model must satisfy the following:

Provide timely/fresh revocation information
Work in all network connectivity scenarios
Preserve the client privacy

Ideally, it will also fail safe :)

netsekure rng

random noise generator

How to approach fixing the TLS trust model

Comments