Going to FOSDEM, talking about Play
I will be at FOSDEM in Bruxelles on Saturday 6 February. I will make a talk about the Play framework in the Free Java session, at 17:15.
I will be at FOSDEM in Bruxelles on Saturday 6 February. I will make a talk about the Play framework in the Free Java session, at 17:15.
Busy as I was, I realized I didn’t blog about my recent employment change. I left Yoono 2 months ago to join a company called Zenexity (site in French). It’s really cool because after Flock and Yoono that were very similar (consumer oriented/social mashup/Mozilla technologies), I get to work on really different stuff: more server-side, and more business oriented. But still with a strong R&D component, and it’s something that really motivated me to get on board with Zenexity: they’re independent because they earn their own money (e.g. don’t live on VC money) but still spend a lot of effort in R&D projects. Projects for customers also are really state-of-the-art of the web.
Specifically, they (I mean “we”) have an Open Source project called the Play! Framework. It’s an MVC framework similar to Django or Ruby on Rails, in Java. Within the Java world, I think it’s pretty disruptive. It contrasts from bloated stacks, and manages to provide simplicity and productivity to Java web development. Also, it speaks the language of the web by making it easy to create RESTful web apps, pretty URLs and web services.
Here is a screencast I did last month for the 1.0 release.
I recently installed the SDK for Android, Google’s OS for mobile phone. I’ve been following that since it was released; I’m pretty attracted by the platform but when I bought my last phone, Android was not really there yet.
One of my requirement is to be able to input Japanese. Currently a Google search about Japanese input will give you mostly links to some third party hack who get you there, but with a clunky interface. Well, I was happy to see in the SDK’s emulator that Android has support for Japanese input built in! You can select it by going to Menu->Settings->Locale & Text.
Now, that doesn’t tell me if I will be able to access it on an Android phone. Sometimes makers and phone operators are known to rip off parts of the OS or add crap for no good reason. If you have an Android phone, I’d love to hear what maker/model it is and whether you do have alternate input methods included.
Seeing the progress done by Chromium on Linux, I wanted to implement an equivalent of a Firefox extension that I like: automatically generating QR Code to send a link to my phone easily.

QRChrome Extension
So far, here is what I have been able to do:
It seems straight-forward but the browser don’t seem to behave exactly as documented, and I didn’t found anyway to debug – at least on Linux. I assume it works better on Windows.
Here is what I want to do before the extension can be really useful:
PS: The feed icon is a sample extension from Google. I don’t understand why it’s not in the core.
I recently learned that Kiyoshiro Imawano died last May, from a cancer at age 58. Most people outside Japan don’t know him, but he was a very famous and very controversial pop singer. He was unique, talented, funny and sometimes engaged.
He is famous for his punk version of the Japanese National anthem Kimi ga Yo, that became banned and can no longer be aired on Japanese media. That reminded me of Gainsbourg’s reggae version of La Marseillaise, very controversial too.
What I want to share today is a song about North Korea. To a US audience the part about world peace may sound a bit cheesy, but you have to understand that how Japanese people hate North Korea. For most Japanese thinking about friendship with North Korea is just insanity, and this is truly a courageous song.
As my friend Nicolas suggested, I decided to give some guidelines and pointers about SEO (Search Engine Optimization). Now, full disclamer: I’m not a SEO expert, so what I am going to write is just very basic rules.
Also, since Google is by far the most popular search engine, I will focus on it. I assume visibility on Google is the most important for most webmasters. Anyway, what is valid for Google is usually valid for other search engines as well.

The reason why Google is so successful is that they managed to be smarter than shady webmasters who tricked search engines. With old search engines you could put 50 times the word “notebook” in a comment to get a decent placement on the search for “notebook”, and you could also hide “Pamela Anderson” in a completely unrelated website to get some visits. Google, with their pagerank and smarter analysis of pages, created better search results for users.
That’s what they are good at, and that’s why they make so much money. So you can bet they’re spending a lot of money and energy keeping this edge on unethical webmasters. I won’t get into details but there are tricks such as cloaking, link farming, page hijacking… You don’t want to use these techniques.
In short, you should play by the rules if you don’t want to get punished. Read Google’s webmaster guidelines. When you are done, you can also use their webmaster tools to check your website and create a sitemap.
Corollary: don’t hire a SEO consultant who would use shady techniques.
It may sound obvious, but you can’t just generate some crappy pages automatically and expect search engines to index it well. Some people do that by shamelessly sucking content from blogs, but it doesn’t work.
Make sure your site is useful for real, human visitors. Also make sure that pages in itself are useful, that a user can jump in the middle of your site and get what he wants.
Search engine spiders don’t see pages like human do: they only understand plain text. That means:
You can somewhat simulate the experience of a search engine by looking at your website using the Links web browser. It’s text-only, not fancy, but you will learn a lot about your site.
The pagerank is a metric introduced by Google to measure the popularity of a page. It’s a number between 0 and 10. The inbound links you have, the highest this number is. If pages with a high pagerank link to your page, you get more “pagerank juice” and your pagerank is even higher.
Additionally, Google prefers inbound links from pages that are in the same category as yours. For example, a page about computer programming will benefit from links from other computer programming related pages, but not so much from pages about breeding dogs.
That means that you should develop your network – get in contact with people who have similar pages, make sure they know yours and that they link to you when it makes sense for their users.
For your website internally, that page rank system also works. A given page that gets popular may get a good pagerank, and your whole site must be architectured in a way that will redistribute the pagerank to other pages. That means not having more pages than necessary; if you do have too many pages (for example content broken into 5 pages when it would make sense to be one, or dynamic pages that don’t give much more information), your pagerank juice will be divided into those pages instead of channeled to the one page. So be careful when you think about what should be on its own page and what should be integrated to another one.
This section could go for over and over, so I’ll just limit myself to the basics.
This is clearly just the basis, but if you manage to get enough inbound links you should be indexed decently. Next thing to do is to find out what keywords you want to optimize to, and monitor that on your favorite search engine. There are also a lot of books about SEO that can help you go deeping into search engine optimization.
A few days ago I’ve been invited to a launch party for a web product in Paris. While the product was nice and polished, it seems like the developers didn’t understand anything about scalability. They didn’t even understand my question when I asked them if the product could scale.
It’s probably not a big deal for them: they were presenting a CMS, so most of the time it will be installed for a limited user base. I guess most people will be happy to use it on a single server, so it’s probably OK for them not to be able to scale. However I noticed that while scalability is now a fairly solved problem, there are not that many articles explaining how to prepare to scalability on the web. So here I go. I will not try to replace a good book, but just to give the very basics.
It’s important to get that out of the way. Scalability is not performance: it’s not about making good use of CPU and bandwidth, and it’s not about having the page being loaded quickly in the user’s browser. It’s about being able to balance the load between several servers. So when the load increases (more users creating accounts, more visitors, more page views) you can add additional servers to balance the load. You don’t just throw in a server, you need to design your software to work on a cluster of servers.
An other point is that you will rarely create a cluster of machines from scratch: when you launch a new website you will have few users so few machines (one or two), and as your load increase you will increase the number of servers. You will have to scale different parts of your system one after the other.
Most of the time you start with a front-end (PHP, Python, Ruby, Java…) and a data layer (MySQL, PostgreSQL, CouchDB…). As your load increase, the front-end will be the first to break. Of course server-side caching will help, but at some point you will need several front-end servers.
The key for that is to ensure you don’t store any data on the front-end. The problem sometimes arise with sessions: a lot of PHP libraries store session information locally on the server, and that prevents from balancing the load. The idea is that in a session a user may hit a server for a given page, then an other for the next page. If the session is only accessible to the first server, you’re screwed. You want it to be somewhere else. That can be in the data layer or in a special sessions server. If you write a Facebook app you don’t need to care, because Facebook takes care of the session.
Now can have as many front-ends as we want, but we have a unique database server.
Most applications will have many more reads than writes. For example in a blogging software, each visitor will trigger a read on the database (OK, not each visitor if there is a good cache), but writes only occur when the author writes a new post or someone leave a comment.
That’s good, because it’s much easier to scale reads than writes. Just make sure that in your code you have different settings for reads and writes. They can point to the same database at launch time, but when the time comes you can separate those. Writes will go to your “main” database, and reads will go to a copy. There are other approaches, but for example MySQL offers replications features. Once set up, the slaves will stay in sync with the master. You can have as many slaves as you need.
OK – several front-ends, several read-only databases, but still one master database for writes. If your applications has few reads it may be fine with a beefy database server, (and some major websites just have one master database), but if you have a lot of writes (highly social applications like Facebook or Twitter) you may want to continue the scaling process.
Now we want to have several databases where we can write to. Obviously, we have to be careful not to introduce inconsistencies in the process. So having an old version of a blog post on a server and the new version on an other one is not great; what if some users see an old version of your post and others see the most recent one ?
There are various strategies to divide data in a safe and consistent way, including:
Here you go, the basics for building a scalable website. That’s not all you have to do, if your website continues growing you will face more problems such as having to scale your network. I’m not talking about outgoing bandwidth but communication between your servers (front-end and data layers). But if your code is efficient, those simple recommendation will get you to a server that can handle a fairly big load. I really recommend Building Scalable Websites, from O’Reilly if you want to know more.
Q: Language X doesn’t scale, but language Y does!
A: Bullshit. It’s not the language that scales, it’s your code. Some languages may not perform as good as others, so you will have to add boxes more often but the way you scale is still the same.
Q: What about cloud computing? Virtualization? All these fancy buzzwords?
Virtualization means you run on virtual machines rather than on physical ones. The benefit is that you can easily add or remove machines. For example, using Amazon EC2 you can add as many machines as you want in a few minutes, and then remove them in no more time. With a classical hosting company, you need to make a phone call, ask for the machines and you get them in maybe one week. They’ll charge you for the set-up too, and if you no longer want it you still have to pay for a full term. So cloud computing offers are generally more flexible.
Q: Does Google App Engine make it easier to scale?
In short, yes. By not letting you access the machines, Google App Engine constrain you into writing scalable code. You also don’t have to request new machines when you need them or release when you no longer need them; you just pay what you use depending on the load of your application.
I am a big fan on Google App Engine but be careful, since it’s programmed in a particular way it’s not easy to move your project out of it. You may feel locked in after you project started.
It’s always a pleasure when a company offers a public bug tracker for users and third party developers. So it’s great to see that Facebook has a public bugzilla.
The problem with having a public bug tracker is that you’re supposed to be responsive and fix bugs; nothing worse that real bugs rotting in the tracker because the group behind the product doesn’t really care about that specific bugs.
So far I’ve filled a few bugs for the Facebook API:
I have also filled a few feature requests, but it doesn’t really matters: they have obviously no obligation to implement them. On the other hand, it would be nice if they fixed the bugs even when they’re not security issues.
One interesting thing I’ve noticed is that not only they have a public bugzilla, but the code source for their API is Open Source. Let’s see: “Facebook Open Platform is a snapshot of the infrastructure that runs Facebook Platform. It includes the API infrastructure, the FBML parser, the FQL parser, and FBJS, as well as implementations of many common methods and tags.”
Cool. That means that if I’m motivated enough, I could fix the bugs and submit a patch! Let’s look at the buggy Stream.get method… Wait a minute – that source code is just a tiny subset of the Facebook API! Oh crap. Looks like I can’t fix it.
I’ve been playing with Google App Engine recently. It’s actually pretty cool, to the point that I’m almost ashamed to have ignored it when it was released. I kind of felt like it would be too restrictive with just Python, just their own database and so on.
But so far, I like it:
And you get all the App Engine specific goodness: easy authentication with Google Accounts, free hosting with huge quotas, and most importantly easy scalability on Google’s infrastructure… Having to call your hosting company to add new servers is a pain in the ass (and in the wallet), having to create and delete instances on Amazon S3 is a much better, but not having to think about it at all is just pure joy.
I just compiled a recent build of Chromium (unbranded Google Chrome) from here. I would love to start playing with extensions, so I tried stuff from there.
Alas, it doesn’t work. Since a bunch of stuff is still missing from the Linux builds, I guess extensions don’t work either.
Did anyone manage to get extensions working on Chromium Linux? Is there a flag or something to add at compilation time?