Axod's Hack

Axod's thoughts on programming, Affiliate Marketing, startups, web2.0, and anything else I think of.

Friday, 11 December 2009

WebSocket - some numbers

I just implemented WebSocket into the Mibbit server, and thought I'd get some real numbers on performance. Having recently updated the Mibbit server to use deflate compression on XHR responses where it provides a net gain, I wanted to see how the two compare.

Note that one of my main focuses is on bandwidth usage. We use a fair amount of bandwidth, and anything we can do to optimize this is a good thing. Also it usually means a speed up for users which is extremely important.

First of all, a recap/explanation of how the conventional Comet works here.
We have two connections, which are set to keep-alive. One is for sending from browser to server, the other one is for server to browser.
The server->browser one is opened by the browser, doing an XHR POST, and held open by the server until data is ready, or until a timeout. Then a new request is again sent from the browser. This means that as soon as data is ready, it's delivered to the browser.

There are a couple of downsides to the above method.
1. Keep-alive isn't failsafe. Sometimes browsers/proxies/etc ignore you, or for other reasons decide to open new connections. Creating a new tcp/ip connection is expensive, and may mean that lag is introduced. For the vast majority of cases though, no new TCP/IP connections are created, and you just have your 2 connections to the mibbit server to handle all communications.
2. Every HTTP request sent from the browser includes headers, wether you want them or not. These are not small headers. expect 2k+ per request. You can remove *some* of these headers (See previous post), but most of them you're stuck with.

Now, enter WebSocket. This basically gives you a bi-directional socket with the server after a small HTTP handshake. The advantages are that there's no HTTP headers from then on, and there shouldn't be any lag due to keep-alive issues.

For the initial test, I opened a Mibbit Widget pointed at a channel on irc.mibbit.net said a couple of things, and did a whois lookup. A reasonably small scale test, but a useful one involving packets sent both ways, and some large packets (MOTD, topic).

First lets see the results for standard XHR:

data recvd: 1222
data sent: 7220
overhead In: 4456
overhead Out: 1229
Total data: 14127

ok, so we have 14k of data, and 5.6k of that is 'overhead' - HTTP headers/request/responses.

Lets see how WebSocket improves on that:

data recvd: 1350
data sent: 7307
overhead In: 118
overhead Out: 176
Total data: 8951

Wow. That's a big improvement. We've cut the overhead down to just 294 bytes (Basically the initial handshake).

Given the above data, it's clear that using WebSocket is a big win both in terms of bandwidth usage, and (although I haven't measured yet) lag. Anecdotally, the WebSocket version did seem a lot 'snappier', so I'd expect the lag to be reduced.

However, we haven't looked into one other area - compression. With HTTP, we can compress responses from the server, and the browser will decompress them fast, and pass them onto js. There is no mechanism for this with WebSockets (yet). If you want compression with WebSockets, you're likely going to have to do it yourself in javascript, which may burn precious browser cpu cycles.

So, finally, here's the numbers for XHR+deflate:

data recvd: 1222
data sent: 1868 (Compressed)
overhead In: 4456
overhead Out: 1229
Total data: 8775

So, this just beats WebSocket for bandwidth usage. It would depend heavily on the type of data you're sending as to how good your compression is, and how the numbers compare with WebSocket.

Just to recap,

XHR: 14,127 bytes
WebSocket: 8,951 bytes
XHR+deflate: 8,775 bytes

Note that this is a reasonably small scale test, but I do believe the numbers will scale pretty much like this. In general, for our type of traffic, the HTTP headers in XHR double the traffic. Again, for our type of traffic, the compression pretty much halves the traffic. So XHR+deflate vs WebSocket is pretty close.

We'll be rolling out WebSocket support in Mibbit in the next few days, and will be able to get some more definitive data on how the two compare. We have quite a large Chrome userbase, hopefully some of which are on the dev builds which support WebSocket.
It's certainly a great upgrade to the web, and hopefully compression support will come in due course.

Implementing WebSocket wasn't actually too bad at all, there were a couple of hoops to jump through, the protocol seems reasonably sane. Sadly the protocol doc is completely insane, and tries to describe what you should do using plain english instead of just giving you the data you need. eg "take the value \b\ and bitwise and it with 0x7f and put the result in a variable \b2\"

We have a very early alpha Mibbit Widget setup on http://wbe02.mibbit.com/?debug=true&channel=%23websocket with support for WebSocket if available, else XHR+deflate, or XHR worst case. To see if it's using WebSocket, click on the debug tab, and you should see a message saying WebSocket created. Alternatively if you use the developer timeline, you can see if there's any XHR going on or not.

Tuesday, 16 June 2009

Revenue / Browser

Here's some interesting stats from Mibbit... I checked out the average revenue generated per 1,000 visits on the main site. The data covered about 800k visits, so reasonably statistically valid I think.

First off, here's the visit breakdown:

Firefox: 58.75%
IE: 26.11%
Chrome: 6.46%
Opera: 3.95%
Safari: 3.63%
Mozilla: 0.67%

Now here's the average revenue generated per 1,000 visits for each browser. Calculated for example as Safari_revenue * 1000 / Safari_visits:

Safari: $2.392
Firefox: $1.599
Mozilla: $1.476
Chrome: $1.053
IE: $1.050
Opera: $0.388

This sort of went against some of my assumptions. I imagined that IE would be the top revenue generator, as you sort of imagine IE users as being less tech savvy, more 'used to' clicking on adverts etc.

The other interesting point to note is that you should never believe the extremely vocal minority who tell you that all firefox users have AdBlockPlus installed. They don't. As you can see Firefox users are the 2nd best revenue generators.

I didn't know where to place safari before I did the calculations, but it does make some sense. Apple users are more used to spending money, (They likely value their time more than their money), so perhaps this is why they generate more revenue.

The shocker was Opera. An Opera user generates just 16% of an average Safari user! That's really poor. Someone mentioned something about built in content blocking in Opera, but I couldn't find it in a default install.

So should I start pushing people away from Opera, and toward firefox+safari? Well, no, they're probably more likely using the browser *because* of their advertising behavior, not the other way around.... Still, food for thought.

So what about OSes?

Macintosh: $2.156
Linux: $2.076
Windows: $1.285

So once again, we have Mac users ready to spend money, click on ads, etc. The surprise is that Linux users generate quite a bit more revenue than Windows users. Counter to what I had assumed previously.

Note that these stats exclude iphone/ipod/opera mobile which aren't really big enough to draw many conclusions from - also people don't click often on ads on mobiles.

The stats were generated using Google analytics tied to adsense, which works really well for things like this.


In summary then:
  • Apple users are good at generating revenue - they buy stuff
  • Linux and firefox users are also good - don't listen to the overly vocal AdBlockPlus user that likes to tell you how everyone using firefox doesn't see any ads anyway
  • IE/windows is solid enough
  • Opera is terrible
  • Google analytics rocks
If you have any thoughts on why Opera should be so bad, please post a comment, perhaps it's to do with the 'turbo mode'? afaik this puts everything through their opera-mini web proxies? so perhaps that blocks ads?

Update:

As mentioned by some commenters, this may have more to do with different locations than browsers, coupled to the fact that some browsers have definite geographical biases. For example, Opera usage in these stats for the US, is 2%, whilst Opera usage for eastern Europe is 8%. In short, Opera may have a geographical bias toward less-easily-monetized countries (At least using adsense). 

Friday, 26 October 2007

Communications for ajax apps

What are the options?
Communication with server could be done with a java applet, or with flash. However, both of these are a little kludgy and assume the user has those plugins. It can also be done with AJAX, which often makes for a more streamlined solution.

XMLHttpRequest
Unless you've been hiding under a rock for the last couple of years, you'll have heard of AJAX - the latest in a long line of buzz words. AJAX covers a number of technologies - Javascript, DOM manipulation, XML, and XMLHttpRequest.

The last one of these is a powerful beast. It allows Javascript code to call back to the server to request more data. This is used for all sorts of new interactive webapps that mean the end user doesn't have to wait for page reloads every time they do something.

Essentially using the XMLHttpRequest means that you can break out of the request/response paradigm of the web, and write webapps that function much more like desktop applications.

The basic usage of an XMLHttpRequest is to perform an HTTP operation with the server (GET/POST/etc), so the client sends a request to the server, and receives a response back.

But how much further can we push the XMLHttpRequest object? Doing a request/response is it a bit limited. Why the hell didn't they just allow us to open a raw socket to the server and have javascript functions send() recv() close() etc. I don't want http. I don't want xml. Give me a raw socket and I'm happy. Fact is they didn't do that, so we'll have to work around it.

First lets take an example. We're writing a simple chat application.
This requires data to be sent in both directions. When someone else says something, the server needs to 'send' that message to the client. When the client says something, it has to 'tell' the server about it. This doesn't instantly fit into the HTTP protocol request/response idea.


Polling
First method people would probably use would be 'polling'. With this you would periodically ask the server if it has any new messages for you. You could either send messages you have in these poll requests, or in a separate ajax call if you want them to be instant.

This is ok, but an utter waste of resources and bandwidth.

Lazy polling
A better method is to use lazy polling. Here, you ask the server if it has any new messages for you, and the server holds your connection open until either it does, or a timeout expires. Then you immediately ask it again.
Advantage: You get the messages as soon as you should. Unlike polling, where you may get a message late, with lazy polling, you pretty much get messages as soon as the server has them for you. This makes your webapp far more responsive.
Advantage: Far less requests, so less HTTP headers etc. Instead of polling the server, maybe sending a request every second, you can just send 1 request, and wait until a timeout say 60 seconds later.
Disadvantage: Your webserver has a connection held open to every client using the app. Depends how well your webserver is configured/programmed to handle this one. Connections are cheap, so it's not really an issue.

But how do you send data to the server? Well it turns out that most common web browsers all you 2 concurrent requests to any one domain at the same time. So that means we can have one connection doing lazy polling for receives, and when we need to send a message (or request an image or other), we can use a separate 'send' ajax request. Note that for further efficiency this request can also respond with messages from the server.
For example, if you have your lazy polling request going, then you send a chat message on your other 'send' request, and the server wants to reply with something, or perhaps coincidently someone else has just said something at exactly the same time, it makes more sense for the server to send these messages down the 'send' request which is open, rather than send them down the 'recv' lazy polling request. If it did that, the client would have to start a new lazy poll request which would waste a bit of bandwidth.
Keep-Alive
Any more optimisations? Well certainly configure to use keep-alive. Keep-alive allows the client to stay connected to the server and simply send more requests on the open connection.
In this configuration, the webapp maintains 2 connections to the server. 1 is the lazy polling 'recv' connection, used to enable the server to 'push out' messages to the client.
The other connection is used for sending data to the server, receiving messages if applicable, and also to fetch other media/images/etc if needed (Although it may be a good idea to put other media on a separate subdomain to allow it to run in its own connection).

HTTP headers?
I have a gripe with http headers... If I open a keep-alive connection to a server, and send 10 requests down that connection, it doesn't make sense to me to send all headers 10 times. The server already knows my userAgent from the first time I said it.
For the other direction we get a little more say, so we can minimise the headers we send out from the server to the client.
How much we talking? Well, it's not good... for my IRC webapp, 84% of received traffic is HTTP headers. 42% of sent traffic is HTTP headers.
However, it's a small amount of data in the grand scale of things, and is the best that's possible.


So although the XMLHttpRequest is pretty lame compared to a raw socket, it can be used exactly like a raw socket if you configure it right.
Using this type of 'raw socket' emulation turns out to give pretty good transfer rates for data, as well as 'ping' times.

Welcome to Axod's Hack

So here it is, my new blog. Hopefully I can write something interesting :)