Chromium Internals - Lifetime of a navigation

Feb 2nd, 2017 10:44 pm

One of the main pieces of functionality in a browser is navigation. It is the process through which the user gets to load documents. Let us trace the life of a navigation from the time an URL is typed in the URL bar and the web page is completely loaded. In this post I will be using the word “browser” to describe the program the user sees and not jus the browser process, which is the privileged one in Chromium’s security model.

The first step is to execute the beforeunload event handler if a document is already loaded. It allows the page to prompt the user whether they want to leave the current one. It is useful in cases such as forms, where the result has not been submitted, so the form data is not lost when moving to a new document. The user can cancel the navigation and no more work will be performed.

If there is no beforeunload handler registered or the user agreed to proceed, the next step is the browser making a network request to the specified URL to retrieve the contents of the document to be rendered. Chromium’s implementation uses the term “provisional load” to describe the state it is in at the start of the network request. Assuming no network level error is encountered (e.g. DNS resolution error, socket connection timeout, etc.), server responds with data and the response headers come first. Once the headers are parsed, they give enough information to determine what needs to be done next.

The HTTP response code allows the browser to know whether one of these conditions has occured:

A successful response follows (2xx)
A redirect has been encountered (response 3xx)
An HTTP level error has occurred (response 4xx, 5xx)

There are two cases where a navigation can complete without resulting in a new document being rendered. The first one is HTTP response code 204 and 205, which tell the browser that the response was successful, but there is no content that follows, therefore the current document must remain active. The other case is when the server responds with a header indicating that the response must be treated as a download. All the data read by the browser is then saved to the local filesystem based on the browser configuration.

The server can also sent a redirect, upon which the browser makes another request based on the HTTP response code and the additional headers. It continues following redirects until either an error or success is encountered.

Once there are no more redirects, if the response is not a 204/205 or a download, the browser reads a small chunk of the actual response data that the server has sent. By default this is used to perform MIME type sniffing, to determine what type of response the server has sent. This behavior can be suppressed by sending a “X-Content-Type-Options: nosniff” header as part of the response headers. At this point the browser is ready to switch to rendering the new document. In Chromium’s implementation, this term used for this point in time is “commit”. Basically the browser has committed to rendering the new document and remove the old one.

However, before the commit is performed, the old document needs to be notified that it is going away, so the browser executes the unload event handler of the old document, if one is registered. Once that is complete, the old document is no longer active, the new document is committed, and in strict terms, the navigation is complete.

The astute reader will realize that even though I said navigation is complete, the user actually doesn’t see anything at this point. Even though most people use the word navigation to describe the act of moving from one page to another, I think of that process as consisting of two phases. So far I have described the navigation phase and once the navigation has been committed, the browser moves into the loading phase. It consists of reading the remaining response data from the server, parsing it, rendering the document so it is visible to the user, executing any script accompanying it, as well as loading any subresources specified by the document. The main reason for splitting it into those two phases is how errors are handled.

This brings us back to the case where the server responds with an error code. When this happens, the browser still commits a new document, but that document is an error page it either generates based on the HTTP response code or reads as the response data from the server. On the other hand, if a successful navigation has committed a real document from the server and has moved to the loading phase it is still possible to encounter an error, for example a network connection can be terminated or times out. In that case the browser is displaying as much of the new document as it has parsed.

Chromium exposes the various stages of navigation and document loading through methods on the WebContentsObserver interfce.

Navigation

DidStartNavigation - invoked at the point after executing the beforeunload event handler and before making the initial network request.
DidRedirectNavigation - invoked every time a server redirect is encountered.
ReadyToCommitNavigation - invoked at the time the browser has determined that it will commit the navigation.
DidFinishNavigation - invoked once the navigation has committed. It can be either an error page if the server responded with an error code or the browser has switched to the loading phase for the new document on successful response.

Document loading

DidStartLoading - invoked when a navigation is about to start, after executing the beforeunload handler.
DocumentLoadedInFrame - invoked when the document itself has completed loading, however it does not mean that all subresources have completed loading.
DidFinishLoad - invoked when the document and all of its subresources have been loaded.
DidStopLoading - invoked when the document, all of its subresources, all subframes and their subresources have completed loading.
DidFailLoad - invoken when the document load failed, for example due to network connection termination before reading all of the response data.

Hopefully this post gives a good introduction to navigations in the browser and should be a good base to build on for future posts.

Chromium Internals - Process Model

Mar 30th, 2016 10:06 pm

Chromium was designed from the very start as a multiprocess browser. Most people think it has one process for each tab and while that is somewhat close to the truth, the real picture is a bit more complicated. It supports a few different modes of operation which differ in how web pages are assigned to processes. Those are called “process models”. It is highly recommended to read the previous posts introducing some basic concepts used by Chromium, which I will use to explain how the different process models work.

Chromium uses the operating system process as a unit of isolation. It uses Blink to render web documents, which it runs in restricted renderer processes. The sandbox does not allow any renderer processes to communicate between each other and the only way to achieve that is to use the browser process as an intermediary. This design allows us to isolate web pages from each other and potentially have a different level of privileges for each process.

Before delving into the actual models the browser supports, there are couple of more bits of detail to cover - cross-process navigation and SiteInstance caveats.

Cross-process navigation

A tab in the browser UI gets a visual representation of the web page from the renderer process and draws it as its content. When navigating from one page to another, the browser makes a network request and gives the response to Blink for rendering. Often, the same instance of the rendering engine, running in the same process, is used. However, in many cases, navigations can result in a new renderer process being created and a brand new instance of Blink being instantiated. The response is handed off to the new renderer process and the tab is then associated with the new process.

The ability to perform cross-process navigations is a core part of Chromium’s design. It incurs the cost of starting a new process, however it also improves performance, as using a new process has a clear memory space, free of fragmentation. Abandoning the old process can be quick (process kill) and also helps mitigate memory leaks, as that process exits and its memory is released back to the operating system. Most importantly, though, changing processes under the hood is a key building block of the security model.

Chromium’s security model also allows for different privilege levels for content being rendered. In general any content coming from the web is considered lowest privilege level. Chromium internal pages, such as chrome://settings, require more privileges, as they need to read or modify settings or data available only in the browser process. The security model does not allow pages from different privilege levels to use the same process, so a cross-process navigation is enforced when crossing privilege level.

SiteInstance caveats

Previous posts in this series described SiteInstance and said that in the example setup, all of a.com, b.com, c.com, d.com were SiteInstances. This is the ideal model to use and is the goal of the “Site Isolation” project, but currently Chromium does not reflect this in reality. Here is a list of caveats that apply to the default Chromium configuration at the time of this post:

SiteInstance is assigned only to the top-level frame by default. Subframes share the same one as the main frame.
SiteInstance does not always reflect the URL of the current document. Once the SiteInstance URL is set, it doesn’t change, even though the frame can navigate across many different Sites. However, SiteInstance can change when navigating cross-site.[1]

Chromium avoids process swaps on cross-site renderer-initiated navigations (e.g. link clicks) because those would be likely to break script calls on windows that expect to communicate with each other (and thus break compatibility with the web). In contrast, it tends to use process swaps on cross-site browser-initiated navigations (e.g. typing URL in the omnibox) because the user is making an effort to leave the site, so it’s not as bad to break the script calls.

Process models

To help illustrate the difference between the different process models, I have included screenshots of the Chromium Task Manager showing the processes and what URLs they are rendering. The setup I have used is the following:

Tab navigated to http://tests.netsekure.org/main-b-c.html. The main document includes two iframes - one to https://google.com and one to https://github.com. It also contains a button that opens a new tab through window.open() call. It causes the new tab to be in the same BrowsingInstance as the tab that opened it.
Newly opened tab from the initial tab, which is navigated to http://tests.netsekure.com/main-sub.c.html. Notice that it is different top level domain from the initial tab (.org vs .com). It includes an iframe to https://pages.github.com.
User opened tab navigated to http://tests.netsekure.com/empty.html
User opened tab navigated to http://tests.netsekure.com/to_slow.html

At the time of this post, the current set of process models is:

Single process
Process per tab
Process per Site
Process per Site instance (default)
Site per process (experimental)

Single process

This is a mode in which Chromium does not use multiple processes. Rather, it combines all of its parts into a single process. It is also a mode in which there is no sandboxing, as the browser needs access to both the network and the filesystem. It exists mainly for testing and it should never be used!

--single-process

Process per tab

This process model is the simplest one to understand and is what most people intuitively think is the mode of operation of the browser. Each tab gets a dedicated sandboxed process that runs the Blink rendering engine. Navigations do not usually change processes. Note however that since the security model does not allow for content with different privileges to live in the same process, it does actually change processes on privilege change. An example would be navigation from https://dev.chromium.org to chrome://settings.

--process-per-tab

Process per Site

In this process model, each Site gets mapped to a single process. When multiple tabs are navigated to the same Site, they will all share the same process. Navigations can change processes.

It is not the default model, since running multiple tabs with heavy web pages, such as Google Docs, leads to low performance - too much contention on the main thread, memory fragmentation, etc.

--process-per-site

Process per Site instance

This is the default process model for Chromium. Each SiteInstance is mapped to a process by default. Multiple tabs navigated to the same Site end up in separate SiteInstances, therefore they reside in separate processes. Navigations also can change processes. All the SiteInstance caveats apply and not the idealized version of SiteInstance.

Default mode

Site per process

This is an experimental process model for developing the “Site Isolation” project. It comes closer to the desired design for Chromium, where there is a SiteInstance for each frame. Additionally, it is using the idealized definition of SiteInstance, where only URLs from the same SiteInstance can be loaded in the same process. Navigations can change processes in any frame on a page, whereas all other process models support changing processes only on the top frame.

--site-per-process

I hope these posts have helped demystify a bit how the Chromium makes decisions on which process to use for specific tab and URL. If there are other clarifications I can make, feel free to ping me over on Twitter and I would be happy to.

Chromium Internals - Documents, Windows, Browsing Contexts

Dec 6th, 2015 5:19 pm

In a previous post, I covered the basic security principal that Chromium uses for its security model. The goal of this post is to outline few details that are vital to understand the limitations imposed on the process model. It will look at somewhat obvious parts of the web platform framed in HTML spec speak.

When a browser is navigated to an URL, it makes a network request to the server specified for the document identified in the URL. The response is a document *, which is then parsed and rendered in a window. Those should be familiar, since they correspond to the identically named objects in JavaScript. This holds true for iframes as well, which have their own window objects, which host the respective documents. The HTML spec uses different naming for window - “browsing context”, while it keeps document as the same concept. There are a few types defined by the standard:

top-level browsing context - the main window for a page
nested browsing context - window embedded in a different window, for example through <iframe> tag
auxiliary browsing context - a top-level browsing context “related” to another browser context, or put in simpler speak - any window created through window.open() API, or a link with target attribute.

I will use frame to refer generically to any browsing context - be it a page or an iframe, as they are basically the same concept with two different names based on the role they play.

There are two concepts the HTML spec defines that are important to understand. The first one is “reachable browsing context”. This is somewhat intuitive, as all frames that are part of a web page are reachable to each other. In JavaScript this is exposed through the window.parent and window.frames properties. In addition, related browsing contexts are reachable too, by using the return value of window.open() and the window.opener property. For example, if we have a page with two iframes, which opens a new window with an iframe, then all of the frames are reachable.

'Two web pages with frames'

The set of reachable frames - all of them in the above case - form the other concept the standard defines - “unit of related browsing contexts”. It is important because documents that want to communicate with other documents are allowed to do so only if they are part of the same unit of related browsing contexts. Internally, the Chromium source code uses the BrowsingInstance class to represent this concept. For the sake of brevity, I’ll use this name from here on.

When two documents want to communicate with each other, they need to have a reference to the window object of the target document. Any frame in a BrowsingInstance can get a reference to any other frame in the same BrowsingInstance since they are all reachable by definition.

How documents can interact with each other is governed by the same origin policy. When documents are from the same origin or can relax their origin to a common one, they are allowed to access each other directly. Cross-origin documents on the other hand are not allowed such access. So a BrowsingInstance can be split in sets of frames and grouped by the origin they are from. But recall that we can’t easily use the origin as a security principle in Chromium. This is why we use the concept of SiteInstance - the set of frames in a BrowsingInstance which host documents from the same Site. It is vital to remember that the Chromium browser process makes all of its process model and isolation decisions based on SiteInstance and not based on origins.

The HTML spec requires all same origin documents, which are part of the same unit of related browsing contexts, to run on the same event loop - or in other words the same thread of execution within a process. This means that all frames which are part of the same SiteInstance must execute on the same thread, however different SiteInstances can run on different ones. In the example above, the two pages are in the same BrowsingInstance because they are related through the window.open() call. The different SiteInstances should be for a.com, b.com, c.com, d.com.

Overall it all boils down to the following rules that Chromium needs to abide by:

All frames within a BrowsingInstance can reference each other.
All frames within a SiteInstance can access each other directly and must run on the same event loop.
Frames from different SiteInstances can run on separate event loops.

Phew! Now there is enough background to start delving into the details of Chromium’s implementation of these concepts from the HTML spec and its process allocation model.

* Unless the result is a file to be downloaded or handled by external application.

Chromium Internals - Security Principal in Chromium

Nov 23rd, 2015 9:32 pm

I have seen many people versed in technology and security make incorrect statements about how Chromium’s multi-process architecture works. The most common misconception is that each tab gets a different process. In reality, it is somewhat true, but not quite. Chromium supports a few different modes of operation and depending on the policy in effect, process allocation is done differently.

I decided to write up an explanation of the default process model and how it actually works. The goal is for it to be comprehensible to as many people as possible, not requiring a degree in Computer Science. However, basic familiarity with the web platform (HTML/JS) is expected. In order to get to it, there are some concepts that need to be defined, so this is the first post in a series, which will explain some of Chromium’s internals and demystify some parts of the HTML spec.

I have found the easiest mental model of the Chromium architecture to be that of an operating system - a kernel running in high privilege level and a number of less privileged usermode application processes. In addition, the usermode processes are isolated from each other in terms of address space and execution context.

'OS Model'

The equivalent of the kernel is the main process, which we call the “browser process”. It runs with the privileges of the underlying OS user account and handles all operations that require regular user permissions - communication over the network, displaying UI, rendering, processing user input, writing files to disk, etc. The equivalent of the usermode processes are the various types of processes that Chromium’s security model supports. The most common ones are:

Renderer process - used for parsing and rendering web content using the Blink rendering engine
GPU process - used for communicating with the GPU driver of the underlying operating system
Utility process - used for performing untrusted operations, such as parsing untrusted data
Plugin process - used for running plugins

They all run in a sandboxed environment and are as locked down as possible for the functionality they perform.

'Chrome Model'

In the modern operating systems design, the principle of least privilege is key and separation between different user accounts is fundamental. User account is a basic unit of separation and I would refer to from here on to this concept as “security principal”. Each operating system has a different way of representing security principals, for example UIDs in Unix and SIDs in Windows, etc. On the web, the security principal is the origin - the combination of the scheme, host, and port of the URL the document has originated from. Access control on the web is governed by the Same Origin Policy (SOP), which allows documents that belong to the same origin to communicate directly with each other and access each other synchronously. Two documents that do not belong to the same origin cannot access each other directly and can only communicate asynchronously, usually through the postMessage API. Overall, the same origin policy has worked very well for the web, but it also has some quirks, which make it unsuitable to treat origins as the security principal for Chromium.

The first reason comes from the HTML specification itself. It allows documents to “relax” its origins for the purpose of evaluating same origin policy. Since the origin contains the full domain of the host serving the document, it can be a subdomain, for example “foo.bar.example.com”. In most cases, however, the example.com domain has full control over all of its subdomains and when documents that belong in separate subdomains want to communicate directly, they are not allowed due to the restrictions of same origin policy. To allow this scenario to work, though, documents are allowed to change their domain for the purposes of evaluating SOP. In the case above, “foo.bar.example.com” can relax its domain up to example.com, which would allow any document on example.com itself to communicate with it. This is achieved through the “domain” property of the document object. It does come with restrictions though.

In order to understand the restrictions of what document.domain can be set to, one needs to know about the Public Suffix List and how it fits in the security model of the web. Top-level domains like “com”, “net”, “uk”, etc., are treated specially and no content can (should) be hosted on those. Each subdomain of a top-level domain can be registered by different entity and therefore must be treated as completely separate. There are cases, however, where they aren’t a top-level domain, but still act as such. An example would be “co.uk”, which serves as a parent domain for commercial entities in the UK to register their domains. Because those cases are effectively in the role of a top-level domain, but are not one, the public suffix list exists as a comprehensive source for browsers and other software to use.

Now that we know about PSL, let’s get back to document.domain. A document cannot change its domain to be anything completely generic or very encompassing, such as “.”. Browsers allow documents to relax their domain up the DNS hierarchy. To use the example from above, “foo.bar.example.com” can set its domain to “bar.example.com” or “example.com”. However, since “.com” is a top-level domain, allowing the document to set its domain to “.com” will lead to security problems. It will allow the document to potentially access documents from any other “.com” domain. Therefore browsers disallow setting the domain to any value in the Public Suffix List and enforce that it must be a valid format of a domain under one of the entries in the PSL. This concept is often referred to as “eTLD+1” - effective top-level domain (a.k.a. entry in the PSL) + one level of domains under it. I will use this naming for brevity from here on.

It is this behavior defined by the HTML spec allowing documents to change their origins that gives us one of the reasons we cannot use the origin as a security principal in our model. It can change in runtime and security decisions made in earlier point in time might no longer be valid. The consistent part that can be taken from the origin is only the eTLD+1 part.

The next oddity of the web is the concept of cookies. It is quite possibly the single most used feature of the web today, but it has its fair share of strange behaviors and brings numerous security problems with it. The problems stem from the fact that cookies don’t really play very well with origins. Recall that origin is the tuple (scheme, host, port), right? The spec however is pretty clear that “Cookies do not provide isolation by port”. But that isn’t all, the spec goes to the next paragraph and says “Cookies do not provide isolation by scheme”. This part has been patched up as the web has evolved though and the notion of “Secure” attribute on cookies was introduced. It marks cookies as available only to hosts running over HTTPS and since HTTP is the other most used protocol on the web, the scheme of an origin is somewhat better isolated and port numbers are completely ignored when cookies are concerned. So basically it is impossible to use origin as a security principal to use and perform access controls against cookie storage.

Finally there is enough background to understand the security principal used by Chromium - site. It is defined as the combination of scheme and the eTLD+1 part of the host. Subdomains and port numbers are ignored. In the case of https://foo.bar.example.com:2341 the effective site for it will be https://example.com. This allows us to perform access control in a web compatible way while still providing a granular level of isolation.

How to approach forking Chromium

Jan 18th, 2015 7:40 pm

One really nice thing about Chromium is its source code is open and released under the BSD license. This allows people to reuse code, extend the browser, or fully fork the project. Each of those are probably worthy of a blog post on its own, but I will focus only on the last one.

Taking Chromium and forking it is fairly easy process, just clone the repository. Make all the changes you would like to do - add missing features, include enhancements, create a totally new UI - it is only limited by one’s imagination. Building the binary from the source code is a little bit laborious, though not too hard. It does take beefy hardware and some time. Once it is built, publishing it is deceptively easy. However, what comes next?

Software in today’s world is not static. As a colleague of mine likes to say - it is almost like a living organism and continuously evolves. There is no shipping it as it was the norm in the ‘90s. The web is in a constant release mode and its model of development has trickled to client side software - be it desktop or mobile apps. Chromium has adopted this model from its initial release and is updating on a very short cycle - currently averaging six weeks between stable releases and two weeks between intermediate stable updates. It is this constant change that makes forking it a bit more challenging. However, there are few steps that one can take to ensure a smoother ride.

Infrastructure

With constantly changing codebase, having a continuous build system is a must for project as big as Chromium and is very useful even for much smaller projects. Setting one up from the get go will be tremendously useful if there is more than one developer working on the code. Its value is even higher if the project needs to build on more than one platform.

What is more important and I would argue it is a must - using a continuous integration system. Running tests on each commit (or thereabout) to ensure there are no breaking changes. It is a requirement for any software project that needs to be in a position to release a new version at any point in time.

The system used in the Chromium project - buildbot - is actually open source and can be adapted to most projects.

Making changes

The most important action one can take when forking Chromium is to study the design of the browser before diving in and making any changes. There are multiple components and layers involved, which interact through well defined interfaces. Understanding the architecture and the patterns used will pay off tremendously in the long run.

Chromium has two main component layers - content and chrome. The former is what implements the barebones of a browser engine - networking stack, rendering engine, browser kernel, multiprocess support, navigation and session history, etc. The chrome layer is built on top of content to implement the browser UI, extensions system, and everything else visible to the user that is not web content.

Each layer communicates with the upper ones through two main patterns - observer and delegate interfaces. Using those interfaces should be the preferred way of extending the browser and building on top of it. Whenever this is not possible, changes to the core are needed. I would strongly suggest preferring to upstream those, if possible of course. It will make maintaining the fork much easier by reduing the burden of keeping up with changes and also shares the improvements with the whole community!

Finally, do yourself a favor to keep you sane in the long run - write tests for all the features you are adding or changes made. It is the only way to ensure that long term the regressions and bug rate is manageable. It will save your sanity!

Keep it moving

The Chromium codebase changes constantly and gets around 100 commits each day. The sane way to keep up with the rate of change is to rebase (or merge) your code on tip-of-tree (ToT) daily or at most weekly. Letting more time lapse makes resolving conflicts a lot harder.

Updating the install base is key to long term success. The update cient used in Chrome on Windows, called Omaha, is also open source. The server side code is not available, though, since it depends heavily on how Google’s internal infrastructure is setup. However the protocol used to communicate between the client and the server is publicly documented.

Development for Chromium relies quite a bit on mailing lists. Subscribing to the two main ones - chromium-dev@chromium.org and blink-dev@chromium.org - is very helpful. It is place where major changes are announced, discussion on development happens, and questions about Chromium development are answered. The security team has a dedicated list for discussions - security-dev@chromium.org.

Keep it secure

Security is one of the core tenets of Chromium. Keeping up with security fixes can be a challenging task, which is best solved by keeping your code always rebased on tip-of-tree. If this is not possible, it is best to subscribe to the security-notify@chromium.org list. It is the communication mechanism the security team uses to keep external projects based on Chromium up-to-date with all the security bugfixes happening in the project.

Plugins

The web is moving more and more to a world without plugins. For me, this is a very exciting time, as plugins usually tend to weaken the browser security. There are two plugins bundled with Chromium to produce Chrome - Adobe Flash Player and a PDF viewer. The latter is now an open source project of its own - PDFium. It can be built and packaged with Chromium, though the same care should be taken as with the browser itself - keep it up-to-date.

–

Overall, maintaining a fork of Chromium isn’t trivial, but it isn’t impossible either. There are a bunch of examples, including the successful migration of the Opera browser from their own rendering engine to building on top of the Chromium content module.

Last, but not least - feel free to reach out and ask questions or advice.

Be humble

Jan 10th, 2015 11:41 am

The topic of this blog post has been long on my mind, but I did not have a good example to use. Finally I found one.

Software security is a very complex field and a asymmetric problem space. Arguments whether defense or offense is harder have been fought for a long time and they will likely never stop. I think most of us can agree on those two statements:

offense needs to find only a handful of problems and work hard to turn them into a compromise
defense needs to architect software to be resilient and work hard to ideally (though not practically) not introduce any problems

Each side requires unique skills and is extremely rare for people to be really good at both. What really irks me is that lots of people in the security industry tend to bash the other side. It is easy, one understands their problem space very very well and knows how hard it is to be an expert. Also, it feels that the other side is not that hard. I mean, how hard can it be, right? Wrong!

In this post I will pick the side of the defender, since this is where I spend most of my time. The example I will use is the recent events with the Aviator browser, because it is near and dear to my heart. One thing I want to make clear from the get go - I totally respect their efforts and applaud them for trying. Forking Chromium is not a small feat and not for the faint of heart. The goals for Aviator are admirable and we definitely need people to experiment with bold and breaking changes. It is through trial and error that we learn, even in proper engineering disciplines :). What can we use more of the security industry?

Humbleness!

It is no surprise people on the offensive side bash software developers for “stupid” mistakes, since the grass is always greener on the other side. The problem is that many trivialize the work required to fix those mistakes. Some are indeed easy. Fixing a stack-based buffer overflow is not too hard. In other cases, it is harder due to code complexity or just fundamental architecture of the code.

What humbles me personally is having tried the attack side. It is not too bad if you want to exploit a simple example problem. Once you try to exploit a modern browser, it is a completely different game. I admire all the exploit writers for it and am constantly amazed by their work. Same goes for a lot of the offensive research going on.

I have secretly wished in the past for some of the offensive folks to try and develop and ship a product. When WhiteHat Security released the Aviator browser, I was very much intrigued how it will develop. It is not a secret that Jeremiah Grossman and Robert Hansen have given lots of talks on how the web is broken and how browser vendors do not want to fix certain classes of issues. They have never been kind in their remarks to browser vendors, but now they have become one. I watched with interest to see how they have mitigated the issues they have been discussing. Heck, I wanted to see clickjacking protection implemented in Chromium, since it is the authors of Aviator that found this attack vector and I have personally thought about that problem space in the past.

Chris Palmer and I have played around with the idea of “Paranoid Mode” in Chromium and as a proof of concept we have written Stannum (source) to see how far we can push it through the extensions APIs. It is much safer to add features to Chromium using extensions than writing C++ code in the browser itself¹. So when Aviator was announced and released initially, I reached out to WhiteHat Security to discuss whether the features they have implemented in C++ could be implemented through the extensions API. My interest was primarily motivated by learning what they have done and what are the limitations of the extensions subsystem. Unfortunately, the discussion did not go far :(.

Where do I believe they could have done better? You might have guessed it right - being humble. The marketing for Aviator is very bold - “the most secure and private Web browser available”. This is a very daring claim to make, hard promise to uphold and anyone who has been in security should know better. Securing a complex piece of software, such as a browser, is a fairly hard task and requires lots of diligence. It takes quite a bit of effort just to stay on top of all the bugs being discovered and features committed, let alone develop defenses and mitigations.

Releasing the source for Aviator was a great step by WhiteHat. It gives us a great example to learn from. Looking at the changes made, it is clear that most the code was written by developers who are new to C++. When making such bold statments, I would have expected more mature code. Skilled C++ developers that understand browsers are rare, but it is a problem that can be solved. It takes a lot of time, effort and desire for someone to learn to use the language and most importantly understand the architecture of the browser. Unfortunately, I did not see any evidence that whoever wrote the Aviator specific code did any studying of the source code or attempted to understand how Chromium is written and integrate the changes well.

What really matters at the end of the day, though, is not the current state of a codebase. After all, every piece of software has bugs. I believe there is one key factor which can determine long term success or failure:

Attitude!

Security vulnerabilities are a fact of life in every large enough codebase. Even in the project I work on we have introduced code that allowed geohot to pull off his total ChromeOS pwnage! We owned up to it, the bug was fixed and we looked around to ensure we did not miss other similar instances.

However, what I was most disappointed by was the reaction from WhiteHat when a critical vulnerability was found in the Aviator specific code:

“Yup, Patches welcome, it’s open source.” ²

Our industry would go further if we follow a few simple steps:

Do not trivialize the work of the opposite side, it is more complex than it appears on the surface.
When working on a complex software or problem, study it first
Share ideas and collaborate
Own up to your mistakes
Be humble

^{1. Even Blink is starting to implement rendering engine features in JavaScript.}
^{2. Nevermind that there is no explanation on how to build Aviator, so one can actually verify that the fix works.}

30 days with isolated apps in Chrome

Mar 30th, 2013 12:00 pm

Update (2014-09-24): It was decided that the isolated apps experimental feature has some usability problems and will not be shipping in Chrome. As such, this functionality either no longer exists or is most likely broken and should not be used. I’m leaving the post for historical reference.

I have been using separate browsers for a while now to isolate generic web browsing from “high value” browsing, such as banking or administration of this blog. The reason I’ve been doing this is that a compromise during generic web browsing is going to be isolated to the browser being used and the “high value” browser will remain secure (barring compromise of the underlying OS).

Recently I’ve decided to give the experimental Chrome feature - "isolated apps" - a try, especially since I’ve recently started working on Chrome and will likely contribute to taking this feature to completion. Chrome already does have a multi-process model in which it uses different renderer processes, which if compromised, should limit the damage that can be done to the overall browser. One of the limitations that exists is that renderer processes have access to all of the cookies and other storage mechanisms in the browser (from here on I will only use cookies, though I mean to include other storage types as well). If an attacker can use a bug in WebKit to get code execution in the renderer process, then this limitation allows requesting your highly sensitive cookies and compromising those accounts. What “isolated apps” helps to solve is isolating storage for web applications from the generic web browsing storage, which helps solve the problem of a compromised renderer stealing all your cookies. In essence, it simulates running the web application in its own browser, without the need to branch out of your current browser. The aim of this blog post is not to describe how this will work, but how to take advantage of this feature. For the full details, read the paper by Charlie Reis and Adam Barth (among others), which is underlying the “isolated apps” work.

In the spirit of my “30 days with …” experiments, I created manifests for the financial sites I use and for my blog. I wanted to see if I will hit any obvious breaking cases or degraded user experience with those “high value” sites. A sample manifest file looks like this:

{
  "name": "netsekure blog",
  "version": "1",
  "app": {
    "urls": [ "*://netsekure.org/" ],
    "launch": {
      "web_url": "https://netsekure.org/wp-admin/"
    },
    "isolation": [ "storage" ]
  },
  "permissions": [ "experimental" ]
}

The way to read the file is as follows:

The “urls” directive is an expression defining the extent encompassed by the web application.
The “web_url” is the launch page for the web app, which provides a good known way to get to the application.
The “isolation” directive is instructing Chrome to isolate the storage for this web app from the generic browser storage.

Once the manifest is authored, you can place it in any directory on your local machine, but ensure the directory has no other files. To actually take advantage of this, you need to do a couple of things:

Enable experimental APIs either through chrome://flags or through the command line with –enable-experimental-extension-apis.
Load the manifest file as an extension. Go to the Chrome Settings page for Extensions, enable “Developer Mode”, and click on “Load unpacked extension”, then navigate to the directory where the manifest file resides and load it.

Once you have gone through the above steps, when you open a new tab, it will have an icon of the isolated web application you have authored. You can use the icon to launch the web app, which will use the URL from the manifest and will run in a separate process with isolated storage.

Now that there is an isolated app installed in Chrome, how can one be assured that this indeed works? There are a couple of things I did to confirm. First, when a Chrome web app is opened, the Chrome Task Manager shows it with a different prefix. Generic web pages start with “Tab: ” followed by the title of the currently displayed page. The prefix for the apps is “App: ”, which indicates that the browser treats this tab as a web application.

In addition to seeing my blog being treated differently, I wanted to be sure that cookies are not shared with the generic browser storage, so I made sure to delete all cookies for my own domain in the “Cookies and Other Data” settings panel. As expected, but still to my surprise, the site continued functioning, since deleting the cookies only affected the general browser storage and my isolated app cookies were not cleared. This intrigued me as to where those cookies are being stored. It turns out, since this is still just an experimental feature, there is no UI to show the storage for the isolated app yet. If you want to prove this to yourself, just like I wanted to, you have to use a tool to let you peek into a SQLite database, which stores those cookies in a file very cleverly named - Cookies. The Cookies db and the cache are located in the directory for your current profile in a subdirectory “Isolated Apps” followed by the unique ID of the app, as generated by Chrome. You can find the ID on the Extensions page, if you expand to see the details for the web app you’ve “installed”. In my case on Windows, the full directory is “%localappdata%\Google\Chrome\User Data\Default\Isolated Apps\dgipdfobpcceghbjkflhepelgjkkflae”. Here is an example of the cookies I had when I went and logged into my blog:

As you can see, there are only two cookies, which were set by WordPress and no other cookies are present.

Now, after using isolated apps for 30 days, I haven’t found anything that was broken by this type of isolation. The sites I’ve included in my testing, besides my blog, are bankofamerica.com, americanexpress.com, and fidelity.com*. The goal now is get this to more usable state, where you don’t need to be a Chrome expert to use it ;).

* Can’t wait for all the phishing emails now to start arriving ;)

← Older Blog Archives

netsekure rng

random noise generator