netsekure rng

random noise generator

Chromium Internals - Security Principal in Chromium

I have seen many people versed in technology and security make incorrect statements about how Chromium’s multi-process architecture works. The most common misconception is that each tab gets a different process. In reality, it is somewhat true, but not quite. Chromium supports a few different modes of operation and depending on the policy in effect, process allocation is done differently.

I decided to write up an explanation of the default process model and how it actually works. The goal is for it to be comprehensible to as many people as possible, not requiring a degree in Computer Science. However, basic familiarity with the web platform (HTML/JS) is expected. In order to get to it, there are some concepts that need to be defined, so this is the first post in a series, which will explain some of Chromium’s internals and demystify some parts of the HTML spec.

I have found the easiest mental model of the Chromium architecture to be that of an operating system - a kernel running in high privilege level and a number of less privileged usermode application processes. In addition, the usermode processes are isolated from each other in terms of address space and execution context.

OS Model

The equivalent of the kernel is the main process, which we call the “browser process”. It runs with the privileges of the underlying OS user account and handles all operations that require regular user permissions - communication over the network, displaying UI, rendering, processing user input, writing files to disk, etc. The equivalent of the usermode processes are the various types of processes that Chromium’s security model supports. The most common ones are:

  • Renderer process - used for parsing and rendering web content using the Blink rendering engine
  • GPU process - used for communicating with the GPU driver of the underlying operating system
  • Utility process - used for performing untrusted operations, such as parsing untrusted data
  • Plugin process - used for running plugins

They all run in a sandboxed environment and are as locked down as possible for the functionality they perform.

Chrome Model

In the modern operating systems design, the principle of least privilege is key and separation between different user accounts is fundamental. User account is a basic unit of separation and I would refer to from here on to this concept as “security principal”. Each operating system has a different way of representing security principals, for example UIDs in Unix and SIDs in Windows, etc. On the web, the security principal is the origin - the combination of the scheme, host, and port of the URL the document has originated from. Access control on the web is governed by the Same Origin Policy (SOP), which allows documents that belong to the same origin to communicate directly with each other and access each other synchronously. Two documents that do not belong to the same origin cannot access each other directly and can only communicate asynchronously, usually through the postMessage API. Overall, the same origin policy has worked very well for the web, but it also has some quirks, which make it unsuitable to treat origins as the security principal for Chromium.

The first reason comes from the HTML specification itself. It allows documents to “relax” its origins for the purpose of evaluating same origin policy. Since the origin contains the full domain of the host serving the document, it can be a subdomain, for example “”. In most cases, however, the domain has full control over all of its subdomains and when documents that belong in separate subdomains want to communicate directly, they are not allowed due to the restrictions of same origin policy. To allow this scenario to work, though, documents are allowed to change their domain for the purposes of evaluating SOP. In the case above, “” can relax its domain up to, which would allow any document on itself to communicate with it. This is achieved through the “domain” property of the document object. It does come with restrictions though.

In order to understand the restrictions of what document.domain can be set to, one needs to know about the Public Suffix List and how it fits in the security model of the web. Top-level domains like “com”, “net”, “uk”, etc., are treated specially and no content can (should) be hosted on those. Each subdomain of a top-level domain can be registered by different entity and therefore must be treated as completely separate. There are cases, however, where they aren’t a top-level domain, but still act as such. An example would be “”, which serves as a parent domain for commercial entities in the UK to register their domains. Because those cases are effectively in the role of a top-level domain, but are not one, the public suffix list exists as a comprehensive source for browsers and other software to use.

Now that we know about PSL, let’s get back to document.domain. A document cannot change its domain to be anything completely generic or very encompassing, such as “.”. Browsers allow documents to relax their domain up the DNS hierarchy. To use the example from above, “” can set its domain to “” or “”. However, since “.com” is a top-level domain, allowing the document to set its domain to “.com” will lead to security problems. It will allow the document to potentially access documents from any other “.com” domain. Therefor browsers disallow setting the domain to any value in the Public Suffix List and enforce that it must be a valid format of a domain under one of the entries in the PSL. This concept is often referred to as “eTLD+1” - effecitve top-level domain (a.k.a. entry in the PSL) + one level of domains under it. I will use this naming for brevity from here on.

It is this behavior defined by the HTML spec allowing documents to change their origins that gives us one of the reasons we cannot use the origin as a security principal in our model. It can change in runtime and security decisions made in earlier point in time might no longer be valid. The consistent part that can be taken from the origin is only the eTLD+1 part.

The next oddity of the web is the concept of cookies. It is quite possibly the single most used feature of the web today, but it has its fair share of strange behaviors and brings numerous security problems with it. The problems stem from the fact that cookies don’t really play very well with origins. Recall that origin is the tuple (scheme, host, port), right? The spec however is pretty clear that “Cookies do not provide isolation by port”. But that isn’t all, the spec goes to the next paragraph and says “Cookies do not provide isolation by scheme”. This part has been patched up as the web has evolved though and the notion of “Secure” attribute on cookies was introduced. It marks cookies as available only to hosts running over HTTPS and since HTTP is the other most used protocol on the web, the scheme of an origin is somewhat better isolated and port numbers are completely ignored when cookies are concerned. So basically it is impossible to use origin as a security principal to use and perform access controls against cookie storage.

Finally there is enough background to understand the security principal used by Chromium - site. It is defined as the combination of scheme and the eTLD+1 part of the host. Subdomains and port numbers are ignored. In the case of the effective site for it will be This allows us to perform access control in a web compatible way while still providing a granular level of isolation.

How to approach forking Chromium

One really nice thing about Chromium is its source code is open and released under the BSD license. This allows people to reuse code, extend the browser, or fully fork the project. Each of those are probably worthy of a blog post on its own, but I will focus only on the last one.

Taking Chromium and forking it is fairly easy process, just clone the repository. Make all the changes you would like to do - add missing features, include enhancements, create a totally new UI - it is only limited by one’s imagination. Building the binary from the source code is a little bit laborious, though not too hard. It does take beefy hardware and some time. Once it is built, publishing it is deceptively easy. However, what comes next?

Software in today’s world is not static. As a colleague of mine likes to say - it is almost like a living organism and continuously evolves. There is no shipping it as it was the norm in the ‘90s. The web is in a constant release mode and its model of development has trickled to client side software - be it desktop or mobile apps. Chromium has adopted this model from its initial release and is updating on a very short cycle - currently averaging six weeks between stable releases and two weeks between intermediate stable updates. It is this constant change that makes forking it a bit more challenging. However, there are few steps that one can take to ensure a smoother ride.


With constantly changing codebase, having a continuous build system is a must for project as big as Chromium and is very useful even for much smaller projects. Setting one up from the get go will be tremendously useful if there is more than one developer working on the code. Its value is even higher if the project needs to build on more than one platform.

What is more important and I would argue it is a must - using a continuous integration system. Running tests on each commit (or thereabout) to ensure there are no breaking changes. It is a requirement for any software project that needs to be in a position to release a new version at any point in time.

The system used in the Chromium project - buildbot - is actually open source and can be adapted to most projects.

Making changes

The most important action one can take when forking Chromium is to study the design of the browser before diving in and making any changes. There are multiple components and layers involved, which interact through well defined interfaces. Understanding the architecture and the patterns used will pay off tremendously in the long run.

Chromium has two main component layers - content and chrome. The former is what implements the barebones of a browser engine - networking stack, rendering engine, browser kernel, multiprocess support, navigation and session history, etc. The chrome layer is built on top of content to implement the browser UI, extensions system, and everything else visible to the user that is not web content.

Each layer communicates with the upper ones through two main patterns - observer and delegate interfaces. Using those interfaces should be the preferred way of extending the browser and building on top of it. Whenever this is not possible, changes to the core are needed. I would strongly suggest preferring to upstream those, if possible of course. It will make maintaining the fork much easier by reduing the burden of keeping up with changes and also shares the improvements with the whole community!

Finally, do yourself a favor to keep you sane in the long run - write tests for all the features you are adding or changes made. It is the only way to ensure that long term the regressions and bug rate is manageable. It will save your sanity!

Keep it moving

The Chromium codebase changes constantly and gets around 100 commits each day. The sane way to keep up with the rate of change is to rebase (or merge) your code on tip-of-tree (ToT) daily or at most weekly. Letting more time lapse makes resolving conflicts a lot harder.

Updating the install base is key to long term success. The update cient used in Chrome on Windows, called Omaha, is also open source. The server side code is not available, though, since it depends heavily on how Google’s internal infrastructure is setup. However the protocol used to communicate between the client and the server is publicly documented.

Development for Chromium relies quite a bit on mailing lists. Subscribing to the two main ones - and - is very helpful. It is place where major changes are announced, discussion on development happens, and questions about Chromium development are answered. The security team has a dedicated list for discussions -

Keep it secure

Security is one of the core tenets of Chromium. Keeping up with security fixes can be a challenging task, which is best solved by keeping your code always rebased on tip-of-tree. If this is not possible, it is best to subscribe to the list. It is the communication mechanism the security team uses to keep external projects based on Chromium up-to-date with all the security bugfixes happening in the project.


The web is moving more and more to a world without plugins. For me, this is a very exciting time, as plugins usually tend to weaken the browser security. There are two plugins bundled with Chromium to produce Chrome - Adobe Flash Player and a PDF viewer. The latter is now an open source project of its own - PDFium. It can be built and packaged with Chromium, though the same care should be taken as with the browser itself - keep it up-to-date.

Overall, maintaining a fork of Chromium isn’t trivial, but it isn’t impossible either. There are a bunch of examples, including the successful migration of the Opera browser from their own rendering engine to building on top of the Chromium content module.

Last, but not least - feel free to reach out and ask questions or advice.

Be humble

The topic of this blog post has been long on my mind, but I did not have a good example to use. Finally I found one.

Software security is a very complex field and a asymmetric problem space. Arguments whether defense or offense is harder have been fought for a long time and they will likely never stop. I think most of us can agree on those two statements:

  • offense needs to find only a handful of problems and work hard to turn them into a compromise
  • defense needs to architect software to be resilient and work hard to ideally (though not practically) not introduce any problems

Each side requires unique skills and is extremely rare for people to be really good at both. What really irks me is that lots of people in the security industry tend to bash the other side. It is easy, one understands their problem space very very well and knows how hard it is to be an expert. Also, it feels that the other side is not that hard. I mean, how hard can it be, right? Wrong!

In this post I will pick the side of the defender, since this is where I spend most of my time. The example I will use is the recent events with the Aviator browser, because it is near and dear to my heart. One thing I want to make clear from the get go - I totally respect their efforts and applaud them for trying. Forking Chromium is not a small feat and not for the faint of heart. The goals for Aviator are admirable and we definitely need people to experiment with bold and breaking changes. It is through trial and error that we learn, even in proper engineering disciplines :). What can we use more of the security industry?


It is no surprise people on the offensive side bash software developers for “stupid” mistakes, since the grass is always greener on the other side. The problem is that many trivialize the work required to fix those mistakes. Some are indeed easy. Fixing a stack-based buffer overflow is not too hard. In other cases, it is harder due to code complexity or just fundamental architecture of the code.

What humbles me personally is having tried the attack side. It is not too bad if you want to exploit a simple example problem. Once you try to exploit a modern browser, it is a completely different game. I admire all the exploit writers for it and am constantly amazed by their work. Same goes for a lot of the offensive research going on.

I have secretly wished in the past for some of the offensive folks to try and develop and ship a product. When WhiteHat Security released the Aviator browser, I was very much intrigued how it will develop. It is not a secret that Jeremiah Grossman and Robert Hansen have given lots of talks on how the web is broken and how browser vendors do not want to fix certain classes of issues. They have never been kind in their remarks to browser vendors, but now they have become one. I watched with interest to see how they have mitigated the issues they have been discussing. Heck, I wanted to see clickjacking protection implemented in Chromium, since it is the authors of Aviator that found this attack vector and I have personally thought about that problem space in the past.

Chris Palmer and I have played around with the idea of “Paranoid Mode” in Chromium and as a proof of concept we have written Stannum (source) to see how far we can push it through the extensions APIs. It is much safer to add features to Chromium using extensions than writing C++ code in the browser itself1. So when Aviator was announced and released initially, I reached out to WhiteHat Security to discuss whether the features they have implemented in C++ could be implemented through the extensions API. My interest was primarily motivated by learning what they have done and what are the limitations of the extensions subsystem. Unfortunately, the discussion did not go far :(.

Where do I believe they could have done better? You might have guessed it right - being humble. The marketing for Aviator is very bold - “the most secure and private Web browser available”. This is a very daring claim to make, hard promise to uphold and anyone who has been in security should know better. Securing a complex piece of software, such as a browser, is a fairly hard task and requires lots of diligence. It takes quite a bit of effort just to stay on top of all the bugs being discovered and features committed, let alone develop defenses and mitigations.

Releasing the source for Aviator was a great step by WhiteHat. It gives us a great example to learn from. Looking at the changes made, it is clear that most the code was written by developers who are new to C++. When making such bold statments, I would have expected more mature code. Skilled C++ developers that understand browsers are rare, but it is a problem that can be solved. It takes a lot of time, effort and desire for someone to learn to use the language and most importantly understand the architecture of the browser. Unfortunately, I did not see any evidence that whoever wrote the Aviator specific code did any studying of the source code or attempted to understand how Chromium is written and integrate the changes well.

What really matters at the end of the day, though, is not the current state of a codebase. After all, every piece of software has bugs. I believe there is one key factor which can determine long term success or failure:


Security vulnerabilities are a fact of life in every large enough codebase. Even in the project I work on we have introduced code that allowed geohot to pull off his total ChromeOS pwnage! We owned up to it, the bug was fixed and we looked around to ensure we did not miss other similar instances.

However, what I was most disappointed by was the reaction from WhiteHat when a critical vulnerability was found in the Aviator specific code:

“Yup, Patches welcome, it’s open source.” 2

Our industry would go further if we follow a few simple steps:

  • Do not trivialize the work of the opposite side, it is more complex than it appears on the surface.
  • When working on a complex software or problem, study it first
  • Share ideas and collaborate
  • Own up to your mistakes
  • Be humble

1. Even Blink is starting to implement rendering engine features in JavaScript.
2. Nevermind that there is no explanation on how to build Aviator, so one can actually verify that the fix works.

30 days with isolated apps in Chrome

Update (2014-09-24): It was decided that the isolated apps experimental feature has some usability problems and will not be shipping in Chrome. As such, this functionality either no longer exists or is most likely broken and should not be used. I’m leaving the post for historical reference.

I have been using separate browsers for a while now to isolate generic web browsing from “high value” browsing, such as banking or administration of this blog. The reason I’ve been doing this is that a compromise during generic web browsing is going to be isolated to the browser being used and the “high value” browser will remain secure (barring compromise of the underlying OS).

Recently I’ve decided to give the experimental Chrome feature - ”isolated apps” - a try, especially since I’ve recently started working on Chrome and will likely contribute to taking this feature to completion. Chrome already does have a multi-process model in which it uses different renderer processes, which if compromised, should limit the damage that can be done to the overall browser. One of the limitations that exists is that renderer processes have access to all of the cookies and other storage mechanisms in the browser (from here on I will only use cookies, though I mean to include other storage types as well). If an attacker can use a bug in WebKit to get code execution in the renderer process, then this limitation allows requesting your highly sensitive cookies and compromising those accounts. What “isolated apps” helps to solve is isolating storage for web applications from the generic web browsing storage, which helps solve the problem of a compromised renderer stealing all your cookies. In essence, it simulates running the web application in its own browser, without the need to branch out of your current browser. The aim of this blog post is not to describe how this will work, but how to take advantage of this feature. For the full details, read the paper by Charlie Reis and Adam Barth (among others), which is underlying the “isolated apps” work.

In the spirit of my “30 days with …” experiments, I created manifests for the financial sites I use and for my blog. I wanted to see if I will hit any obvious breaking cases or degraded user experience with those “high value” sites. A sample manifest file looks like this:

  "name": "netsekure blog",
  "version": "1",
  "app": {
    "urls": [ "*://" ],
    "launch": {
      "web_url": ""
    "isolation": [ "storage" ]
  "permissions": [ "experimental" ]

The way to read the file is as follows:

  • The “urls” directive is an expression defining the extent encompassed by the web application.
  • The “web_url” is the launch page for the web app, which provides a good known way to get to the application.
  • The “isolation” directive is instructing Chrome to isolate the storage for this web app from the generic browser storage.

Once the manifest is authored, you can place it in any directory on your local machine, but ensure the directory has no other files. To actually take advantage of this, you need to do a couple of things:

  • Enable experimental APIs either through chrome://flags or through the command line with –enable-experimental-extension-apis.
  • Load the manifest file as an extension. Go to the Chrome Settings page for Extensions, enable “Developer Mode”, and click on “Load unpacked extension”, then navigate to the directory where the manifest file resides and load it.

Once you have gone through the above steps, when you open a new tab, it will have an icon of the isolated web application you have authored. You can use the icon to launch the web app, which will use the URL from the manifest and will run in a separate process with isolated storage.

Now that there is an isolated app installed in Chrome, how can one be assured that this indeed works? There are a couple of things I did to confirm. First, when a Chrome web app is opened, the Chrome Task Manager shows it with a different prefix. Generic web pages start with “Tab: ” followed by the title of the currently displayed page. The prefix for the apps is “App: ”, which indicates that the browser treats this tab as a web application.

In addition to seeing my blog being treated differently, I wanted to be sure that cookies are not shared with the generic browser storage, so I made sure to delete all cookies for my own domain in the “Cookies and Other Data” settings panel. As expected, but still to my surprise, the site continued functioning, since deleting the cookies only affected the general browser storage and my isolated app cookies were not cleared. This intrigued me as to where those cookies are being stored. It turns out, since this is still just an experimental feature, there is no UI to show the storage for the isolated app yet. If you want to prove this to yourself, just like I wanted to, you have to use a tool to let you peek into a SQLite database, which stores those cookies in a file very cleverly named - Cookies. The Cookies db and the cache are located in the directory for your current profile in a subdirectory “Isolated Apps” followed by the unique ID of the app, as generated by Chrome. You can find the ID on the Extensions page, if you expand to see the details for the web app you’ve “installed”. In my case on Windows, the full directory is “%localappdata%\Google\Chrome\User Data\Default\Isolated Apps\dgipdfobpcceghbjkflhepelgjkkflae”. Here is an example of the cookies I had when I went and logged into my blog:

As you can see, there are only two cookies, which were set by WordPress and no other cookies are present.

Now, after using isolated apps for 30 days, I haven’t found anything that was broken by this type of isolation. The sites I’ve included in my testing, besides my blog, are,, and*. The goal now is get this to more usable state, where you don’t need to be a Chrome expert to use it ;).

* Can’t wait for all the phishing emails now to start arriving ;)

Pass-The-Hash vs cookie stealing

I saw a few talks at the BlueHat conference at Microsoft and the funniest of all was Joe McCray’s (@j0emccray) “You Spent All That Money And You Still Got Owned????”. At some point, he touched on Pass-The-Hash attacks and asked why those can’t be prevented. That struck me as an interesting question and an analogy popped in my head:

“pass-the-hash attacks are functionally equivalent to cookie stealing attacks”

If you think about the pass-the-hash attack, it requires administrator privileges, which means you can get LocalSystem level privileges, at which point you own the operating system. Then you extract the user’s hash out of memory or from the SAM database and you inject them into the attacker’s machine. Then you rely on single-sign on built on top of NTLM/Kerberos to authenticate to remote resources.

What if we assume the following mapping: OS -> Browser, LocalSystem code execution -> Browser code execution, User’s hash -> User’s cookie, Single Sign On -> HTTP session with cookies?

It can be easily observed that the pass-the-hash attack is equivalent to attacker having code execution in the context of the browser, stealing the user’s cookies, injecting them into the attacker’s browser, and accessing remote resources. Actually, in the web world, one doesn’t even need code execution in the browser to steal the user’s cookies, it can be done through purely web based attacks.

Is it possible to defend against attacker using your cookies? It is extremely hard, because to the remote server, your cookie is *you*. From that perspective, a Windows domain is no different than web HTTP domain, so remote resources have no way of telling apart the real you and someone having your token, be it a password hash or a cookie. I haven’t gone through the thought experiment of mapping best practices for securing against cookie stealing attacks to see if those will nicely map into best practices for defending against pass-the-hash attacks, so I’d leave that as an exercise for the reader.

How to approach fixing the TLS trust model

TLS is an exciting protocol and its wide deployment makes it even more interesting to work on. It has been said many times that the success of online commerce is due to the success of SSL/TLS and the fact that people felt safe in submitting their credit card information over the Internet. These days a lot of people have been speaking openly about how broken the TLS trust model is because of its reliance on Internet PKI and the Certificate Authorities infrastructure and there is some truth to that. We have seen two cases already where CAs trusted by all browsers have issued fraudulent certificates for high profile sites. Those incidents revealed two key problems in the existing TLS infrastructure today:

  • Any CA can issue a certificate for any web site on the Internet, which I call “certificate binding” problem
  • Revocation checking as deployed is ineffective

To counteract these deficiencies, multiple proposals have either existed or emerged. The most notable two are using DNSSEC for certificate binding and Convergence, which is based on independent system of notaries. While both have merits, I believe both have sufficient problems that will prevent them from being deployed widely, which is required for any successful change.

Using DNSSEC for storing information about the server cert is actually very appealing. The admin is in control of which cert is deployed and controls the DNS zone for the site, so it makes sense to be able to use the DNS zone information to control trust. There is even a working group inside IETF to work on such proposal. The problems with DNSSEC are multiple though, here I list just a few:
  •  It is not as widely deployed yet, but that is being fixed as we speak
  • Client stub resolvers don’t actually verify the DNSSEC signature, rather rely on the DNS server to do the verification. This opens the client to attack, if a man-in-the-middle is between the client and its DNS server.
  • There is non-zero amount of corporate networks, which do not allow DNS resolution of public Internet addresses. In such environments, the clients rely on the proxy to do the public Internet DNS resolution. This will break DNSSEC based approach, as clients don’t have access to the DNS records.
  • DNSSEC is yet another trust hierarchy, which is not much different than the current PKI on the web, just a different instance.

Moxie Marlinspike has the right idea about trust agility and his proposal, which he calls Convergence, has a very good foundation. Where I believe it falls short is the fact that many corporate networks block outgoing traffic to a big portion of the Internet. Unless all notaries are white-listed for communication, traffic to those will be blocked, which will prevent Convergence from working properly. Also, the local caching creates a problem with timely revocation - if a certificate is found to be compromised, then until the cache expires, it will still be treated as a valid one.

My take
I actually don’t want to introduce any new methods of doing certificate validation. My goal is to point out a solution pattern that can be used to make any scheme actually deployable and satisfying most (if not all) cases. There are few basic properties any scheme should have:
  1. All information needed for doing trust verification should be available if connectivity to the server is available
  2. Certificate should be bound to the site, such that there is 1-to-1 mapping between the cert and the site.
  3. A fresh proof of validity must be supplied

There is already existing and deployed, although rather rarely, part of TLS called OCSP stapling. It does something very simple - the TLS server performs the OCSP request, receives the response, and then supplies that response as part of the TLS handshake, the last part being the most crucial. The inclusion of the OCSP response as a TLS message removes all of the network problems that the currently proposed solutions face. As long as the client can get a TLS connection to the server, trust validation data can be retrieved. This brings property 1 to the table. In addition, OCSP responses are short lived, which satisfies property 3 as well. So the only missing piece is the 1-to-1 property.

So, there are two ways the problem can be approached - either bring certificate binding to OCSP somehow, or use any other method to provide certificate binding. The latter can actually be achieved rather easily with minimal changes to clients and servers. RFC 4366, Section 3.6 describes the Certificate Status Request extension, which is the underlying protocol messaging of how OCSP stapling is implemented. The definition of the request message is:

      struct {
          CertificateStatusType status_type;
          select (status_type) {
              case ocsp: OCSPStatusRequest;
          } request;
      } CertificateStatusRequest;

      enum { ocsp(1), (255) } CertificateStatusType;

The structure is extensible allowing for any other type of certificate status to be requested, as long as it is defined. I can easily see this message defining DNSSEC and Convergence as values of CertificateStatusType, then define the appropriate format of the request sent by the client. Conveniently, the response from the server is also very much extensible:

      struct {
          CertificateStatusType status_type;
          select (status_type) {
              case ocsp: OCSPResponse;
          } response;
      } CertificateStatus;

      opaque OCSPResponse<1..2^24-1>;

Currently, the only value defined is for OCSP response, which is treated as opaque value as far as TLS is concerned. Nothing prevents whatever information the above proposals return from being transmitted as opaque data to the client.

Just like Moxie explored in his presentation, using the server to do the work of retrieving trust verification data preserves the privacy of the client. It does put some extra burden on the server to have proper connectivity, but that is much more manageable and totally under the control of the administrator.

It is true that there will have to be change implemented by both clients and servers which that will take time. I fully acknowledge that fact. I do believe though, that using the Certificate Status Request is the most logical piece of infrastructure to use to avoid all possible network related problems and provide an inline, fresh, binding trust verification data from the server to the client.

One thing I have not yet answered to myself is - how do we make any new model to fail safe. Having hard failure and denying the user access has been problem forever, but if we keep on failing unsafe, we will continue chasing the same problem into the future.

So in conclusion, for any solution to fixing the TLS trust model must satisfy the following:
  • Provide timely/fresh revocation information
  • Work in all network connectivity scenarios
  • Preserve the client privacy
Ideally, it will also fail safe :)

TLS Client Authentication and Trusted Issuers List

One of the common questions I’ve seen asked lately is related to TLS client authentication, which likely means more people are interested in stronger client authentication. The problem people are hitting is described in KB 933430, where the message the server sends to the client to request client authentication is being trimmed. Let’s look at why this occurs and what are the possible solutions, but first some background.

When TLS server is configured to ask for client authentication, it sends as part of the handshake the TLS CertificateRequest message. The TLS 1.2 RFC defines the message as follows:

      struct {
          ClientCertificateType certificate_types<1..2^8-1>;
          DistinguishedName certificate_authorities<0..2^16-1>;
      } CertificateRequest;

where the supported_signature_algorithms is addition in the 1.2 version of the TLS protocol. The certificate_authorities part of the message is further defined:

opaque DistinguishedName<1..2^16-1>;

When the server sends this message, it optionally fills in the certificate_authorities part of the message with a list of distinguished names of acceptable CAs on the server. The main reason for this list is for the server to help the client in narrowing down the set of acceptable certificates to choose from. For example, if the server only accepts certificates issued by the company private CA, there is no need for the client to send a certificate issued by a public CA, as the server won’t trust it. Nothing in the RFC prevents the client from sending any certificate, but it is in the best interest of the client to send appropriate certificate.

On Windows, the way the TLS is implemented, the server picks all the certificates that are present in the “Local Computer” “Trusted Root Certification Authorities” store (or in short the local machine root store). With Windows Server 2008 and later, the default list of trusted authorities is very small as I’ve described in a previous post, so including those distinguished names in the message does not pose a problem. However, if the server has most of the trusted roots installed or has additional root certificates, it is possible for the combined length of the distinguished names to exceed the limit of the TLS record size, which is 214 (16384) bytes. The TLS standard supports this, as it breaks messages up into records and a single message can span multiple records - this is called record fragmentation. Windows does not implement this part of the RFC though, so it cannot send messages that are bigger than what a TLS record can hold. In most cases this works just fine, but in this particular instance it is a problem.

Now, what can be done to solve this problem. The are two solutions - either decrease the list of root certificates in the message or do not send that list at all (allowed by the RFC). The former approach is possible, but is more error prone and I wouldn’t recommend it for most people. I would argue that the latter approach is the preferred one, but I always get backlash when I propose this. If one were to think about the original purpose of this message and the way Windows has implemented this, it will be easier to understand why this is better. Remember:

  • the server wants to “help” the client do an informed decision on which certificate to send as its identity
  • Windows sends the contents of the local machine root store as the list

If we take those two factors together, the end result is that the server is not helping *at all* the client to make a good decision.  A few sources - my own research,’s SSL Survey, and the EFF SSL Observatory - point out that 10-15 root CAs issue the majority of the certificates seen on the web, therefore if we send 16k worth of these, the probability that any certificate the client has is *not* issued by someone in the list is close to zero. Therefore, in this configuration, the list the server presents to the client doesn’t effectively filter down the set of certificates on the client. It is almost equivalent to sending an empty list to the client and ask it to chose randomly.

In short, if you are hitting this problem, you are better of using Method 3 described in KB 933430 and setting SendTrustedIssuerList to 0, which disables sending of the list of CAs than any other method.