Going to FOSDEM, talking about Play

January 23rd, 2010 No comments

I will be at FOSDEM in Bruxelles on Saturday 6 February. I will make a talk about the Play framework in the Free Java session, at 17:15.

I

Categories: Misc Tags: , ,

What I’ve been up to: Zenexity, Play

November 30th, 2009 2 comments

Busy as I was, I realized I didn’t blog about my recent employment change. I left Yoono 2 months ago to join a company called Zenexity (site in French). It’s really cool because after Flock and Yoono that were very similar (consumer oriented/social mashup/Mozilla technologies), I get to work on really different stuff: more server-side, and more business oriented. But still with a strong R&D component, and it’s something that really motivated me to get on board with Zenexity: they’re independent because they earn their own money (e.g. don’t live on VC money) but still spend a lot of effort in R&D projects. Projects for customers also are really state-of-the-art of the web.

Specifically, they (I mean “we”) have an Open Source project called the Play! Framework. It’s an MVC framework similar to Django or Ruby on Rails, in Java. Within the Java world, I think it’s pretty disruptive. It contrasts from bloated stacks, and manages to provide simplicity and productivity to Java web development. Also, it speaks the language of the web by making it easy to create RESTful web apps, pretty URLs and web services.

Here is a screencast I did last month for the 1.0 release.

A web app in 10 minutes using Play! from zenexity on Vimeo.

Categories: hacking, life, tech Tags: ,

Japanese Input now built-in in Android

September 23rd, 2009 2 comments

I recently installed the SDK for Android, Google’s OS for mobile phone. I’ve been following that since it was released; I’m pretty attracted by the platform but when I bought my last phone, Android was not really there yet.

One of my requirement is to be able to input Japanese. Currently a Google search about Japanese input will give you mostly links to some third party hack who get you there, but with a clunky interface. Well, I was happy to see in the SDK’s emulator that Android has support for Japanese input built in! You can select it by going to Menu->Settings->Locale & Text.

Japanese Input in Android 1.5

Now, that doesn’t tell me if I will be able to access it on an Android phone. Sometimes makers and phone operators are known to rip off parts of the OS or add crap for no good reason. If you have an Android phone, I’d love to hear what maker/model it is and whether you do have alternate input methods included.

Categories: mobile Tags: , , ,

Building Chrome/Chromium Extensions

September 16th, 2009 No comments

Seeing the progress done by Chromium on Linux, I wanted to implement an equivalent of a Firefox extension that I like: automatically generating QR Code to send a link to my phone easily.

QRChrome Extension

QRChrome Extension

So far, here is what I have been able to do:

  • Add a page action icon in the URL bar
  • Open a new tab with the QR Code for the URL you want

It seems straight-forward but the browser don’t seem to behave exactly as documented, and I didn’t found anyway to debug – at least on Linux. I assume it works better on Windows.

Here is what I want to do before the extension can be really useful:

  • Just like the Mobile Barcoder, Show a popup on hover rather than opening a new tab.
  • Add a item in the link context menu, so the user can generate a code from a link – not just from the URL bar. Useful for download links.
  • Package the extension! That may sound stupid, but extensions are packaged from Chrome directly – and it doesn’t work on Linux. It’s not something you can do from command line… Well, actually you can package from command line but using the Chrome binary and that doesn’t work either

PS: The feed icon is a sample extension from Google. I don’t understand why it’s not in the core.

Kiyoshiro Imawano

August 3rd, 2009 No comments

I recently learned that Kiyoshiro Imawano died last May, from a cancer at age 58. Most people outside Japan don’t know him, but he was a very famous and very controversial pop singer. He was unique, talented, funny and sometimes engaged.

He is famous for his punk version of the Japanese National anthem Kimi ga Yo, that became banned and can no longer be aired on Japanese media. That reminded me of Gainsbourg’s reggae version of La Marseillaise, very controversial too.

What I want to share today is a song about North Korea. To a US audience the part about world peace may sound a bit cheesy, but you have to understand that how Japanese people hate North Korea. For most Japanese thinking about friendship with North Korea is just insanity, and this is truly a courageous song.

Categories: Misc Tags: , ,

Search Engine Optimization Basics

July 28th, 2009 4 comments

As my friend Nicolas suggested, I decided to give some guidelines and pointers about SEO (Search Engine Optimization). Now, full disclamer: I’m not a SEO expert, so what I am going to write is just very basic rules.

Also, since Google is by far the most popular search engine, I will focus on it. I assume visibility on Google is the most important for most webmasters. Anyway, what is valid for Google is usually valid for other search engines as well.

Rule #1: Don’t Try to Trick Googlespider

The reason why Google is so successful is that they managed to be smarter than shady webmasters who tricked search engines. With old search engines you could put 50 times the word “notebook” in a comment to get a decent placement on the search for “notebook”, and you could also hide “Pamela Anderson” in a completely unrelated website to get some visits. Google, with their pagerank and smarter analysis of pages, created better search results for users.

That’s what they are good at, and that’s why they make so much money. So you can bet they’re spending a lot of money and energy keeping this edge on unethical webmasters. I won’t get into details but there are tricks such as cloaking, link farming, page hijacking… You don’t want to use these techniques.

In short, you should play by the rules if you don’t want to get punished. Read Google’s webmaster guidelines. When you are done, you can also use their webmaster tools to check your website and create a sitemap.

Corollary: don’t hire a SEO consultant who would use shady techniques.

Rule #2: Have Good Content

It may sound obvious, but you can’t just generate some crappy pages automatically and expect search engines to index it well. Some people do that by shamelessly sucking content from blogs, but it doesn’t work.

Make sure your site is useful for real, human visitors. Also make sure that pages in itself are useful, that a user can jump in the middle of your site and get what he wants.

Rule #3: Be Readable

Search engine spiders don’t see pages like human do: they only understand plain text. That means:

  • Avoid Flash, in particular for content. Search engines won’t be able to read it so they can’t index it.
  • Avoid text as images. If you have to do it for titles because you want to use your fancy font, make sure put the text content in the alt= attribut.
  • Avoid text generated by Javascript. Search engines spiders won’t run the Javascript.

You can somewhat simulate the experience of a search engine by looking at your website using the Links web browser. It’s text-only, not fancy, but you will learn a lot about your site.

Rule #4: Understand the Pagerank

The pagerank is a metric introduced by Google to measure the popularity of a page. It’s a number between 0 and 10. The inbound links you have, the highest this number is. If pages with a high pagerank link to your page, you get more “pagerank juice” and your pagerank is even higher.

Additionally, Google prefers inbound links from pages that are in the same category as yours. For example, a page about computer programming will benefit from links from other computer programming related pages, but not so much from pages about breeding dogs.

That means that you should develop your network – get in contact with people who have similar pages, make sure they know yours and that they link to you when it makes sense for their users.

For your website internally, that page rank system also works. A given page that gets popular may get a good pagerank, and your whole site must be architectured in a way that will redistribute the pagerank to other pages. That means not having more pages than necessary; if you do have too many pages (for example content broken into 5 pages when it would make sense to be one, or dynamic pages that don’t give much more information), your pagerank juice will be divided into those pages instead of channeled to the one page. So be careful when you think about what should be on its own page and what should be integrated to another one.

Rule #5: Optimize your Content

This section could go for over and over, so I’ll just limit myself to the basics.

  • Design good titles. Not only that’s what your users see in the result page, but the keywords there weight a lot for search engines. Also put the relevant information first, like in this blog I put the article title before the blog title.
  • Avoid duplicated content. That usually happens when you have several URLs that show the same content. For example, if you have http://www.mysite.com and http://mysite.com, that’s two pages for a search engine. Having both URLs working is a good thing, but make sure one redirects to the other.

Going Further

This is clearly just the basis, but if you manage to get enough inbound links you should be indexed decently. Next thing to do is to find out what keywords you want to optimize to, and monitor that on your favorite search engine. There are also a lot of books about SEO that can help you go deeping into search engine optimization.

Categories: tech Tags: , ,

Simple Tips to Build Scalable Websites

July 1st, 2009 3 comments

A few days ago I’ve been invited to a launch party for a web product in Paris. While the product was nice and polished, it seems like the developers didn’t understand anything about scalability. They didn’t even understand my question when I asked them if the product could scale.

It’s probably not a big deal for them: they were presenting a CMS, so most of the time it will be installed for a limited user base. I guess most people will be happy to use it on a single server, so it’s probably OK for them not to be able to scale. However I noticed that while scalability is now a fairly solved problem, there are not that many articles explaining how to prepare to scalability on the web. So here I go. I will not try to replace a good book, but just to give the very basics.

What is scalability?

It’s important to get that out of the way. Scalability is not performance: it’s not about making good use of CPU and bandwidth, and it’s not about having the page being loaded quickly in the user’s browser. It’s about being able to balance the load between several servers. So when the load increases (more users creating accounts, more visitors, more page views) you can add additional servers to balance the load. You don’t just throw in a server, you need to design your software to work on a cluster of servers.

An other point is that you will rarely create a cluster of machines from scratch: when you launch a new website you will have few users so few machines (one or two), and as your load increase you will increase the number of servers. You will have to scale different parts of your system one after the other.

#1: the web front-end

Most of the time you start with a front-end (PHP, Python, Ruby, Java…) and a data layer (MySQL, PostgreSQL, CouchDB…). As your load increase, the front-end will be the first to break. Of course server-side caching will help, but at some point you will need several front-end servers.

The key for that is to ensure you don’t store any data on the front-end. The problem sometimes arise with sessions: a lot of PHP libraries store session information locally on the server, and that prevents from balancing the load. The idea is that in a session a user may hit a server for a given page, then an other for the next page. If the session is only accessible to the first server, you’re screwed. You want it to be somewhere else. That can be in the data layer or in a special sessions server. If you write a Facebook app you don’t need to care, because Facebook takes care of the session.

Now can have as many front-ends as we want, but we have a unique database server.

#2: the read operations on the database

Most applications will have many more reads than writes. For example in a blogging software, each visitor will trigger a read on the database (OK, not each visitor if there is a good cache), but writes only occur when the author writes a new post or someone leave a comment.

That’s good, because it’s much easier to scale reads than writes. Just make sure that in your code you have different settings for reads and writes. They can point to the same database at launch time, but when the time comes you can separate those. Writes will go to your “main” database, and reads will go to a copy. There are other approaches, but for example MySQL offers replications features. Once set up, the slaves will stay in sync with the master. You can have as many slaves as you need.

OK – several front-ends, several read-only databases, but still one master database for writes. If your applications has few reads it may be fine with a beefy database server, (and some major websites just have one master database), but if you have a lot of writes (highly social applications like Facebook or Twitter) you may want to continue the scaling process.

#3: the write database

Now we want to have several databases where we can write to. Obviously, we have to be careful not to introduce inconsistencies in the process. So having an old version of a blog post on a server and the new version on an other one is not great; what if some users see an old version of your post and others see the most recent one ?

There are various strategies to divide data in a safe and consistent way, including:

  • Depending on the userid (or blogid, or whatever makes sense in your application), put the data on one server on an other. For example, all users with an even id go to server1 and all users with an odd id go to server2. Hint: make sure your algorithm lets you add more servers later, which is not the case with my example where you will be stuck at 2 servers :)
  • Put some tables on a server, some others on an other. It doesn’t help you when a table is growing too much, but it can be combined with the previous point.

Conclusion

Here you go, the basics for building a scalable website. That’s not all you have to do, if your website continues growing you will face more problems such as having to scale your network. I’m not talking about outgoing bandwidth but communication between your servers (front-end and data layers). But if your code is efficient, those simple recommendation will get you to a server that can handle a fairly big load. I really recommend Building Scalable Websites, from O’Reilly if you want to know more.

FAQ

Q: Language X doesn’t scale, but language Y does!

A: Bullshit. It’s not the language that scales, it’s your code. Some languages may not perform as good as others, so you will have to add boxes more often but the way you scale is still the same.

Q: What about cloud computing? Virtualization? All these fancy buzzwords?

Virtualization means you run on virtual machines rather than on physical ones. The benefit is that you can easily add or remove machines. For example, using Amazon EC2 you can add as many machines as you want in a few minutes, and then remove them in no more time. With a classical hosting company, you need to make a phone call, ask for the machines and you get them in maybe one week. They’ll charge you for the set-up too, and if you no longer want it you still have to pay for a full term. So cloud computing offers are generally more flexible.

Q: Does Google App Engine make it easier to scale?

In short, yes. By not letting you access the machines, Google App Engine constrain you into writing scalable code. You also don’t have to request new machines when you need them or release when you no longer need them; you just pay what you use depending on the load of your application.

I am a big fan on Google App Engine but be careful, since it’s programmed in a particular way it’s not easy to move your project out of it. You may feel locked in after you project started.

Facebook’s Bugzilla and Open Source Code

June 23rd, 2009 No comments

facebookIt’s always a pleasure when a company offers a public bug tracker for users and third party developers. So it’s great to see that Facebook has a public bugzilla.

The problem with having a public bug tracker is that you’re supposed to be responsive and fix bugs; nothing worse that real bugs rotting in the tracker because the group behind the product doesn’t really care about that specific bugs.

So far I’ve filled a few bugs for the Facebook API:

  • A security bug, letting anyone deactivate feed templates for any app (fixed)
  • 5159: Incorrect error code returned for Stream.get – I can’t believe they won’t acknowledge it’s a real bug!
  • 5624: Stream.get is not really using the updated time. In short, we miss data because the query doesn’t do what the doc says it does. They do recognize that should be fixed, but didn’t put a very high priority.

I have also filled a few feature requests, but it doesn’t really matters: they have obviously no obligation to implement them. On the other hand, it would be nice if they fixed the bugs even when they’re not security issues.

One interesting thing I’ve noticed is that not only they have a public bugzilla, but the code source for their API is Open Source. Let’s see: “Facebook Open Platform is a snapshot of the infrastructure that runs Facebook Platform. It includes the API infrastructure, the FBML parser, the FQL parser, and FBJS, as well as implementations of many common methods and tags.”

Cool. That means that if I’m motivated enough, I could fix the bugs and submit a patch! Let’s look at the buggy Stream.get method… Wait a minute – that source code is just a tiny subset of the Facebook API! Oh crap. Looks like I can’t fix it.

Categories: tech Tags: , , ,

Thoughts on Google App Engine

May 26th, 2009 No comments

I’ve been playing with Google App Engine recently. It’s actually pretty cool, to the point that I’m almost ashamed to have ignored it when it was released. I kind of felt like it would be too restrictive with just Python, just their own database and so on.

But so far, I like it:

  • It’s only Python (or Java), but you can do pretty much anything you would do in a non-App Engine Python project. You can load pure pythonic third party libraries by just including them in your project.
  • The free quotas are really big. It’s enough for a hobby project, and if it becomes successful enough to hit the ceiling you should be able to figure out a way to monetize it to pay your Google bill.
  • You can use your own domain name even with a free account
  • There is no SQL, but Google’s BigTable seems to be good enough. Heck, that’s what they use for most of their products!

And you get all the App Engine specific goodness: easy authentication with Google Accounts, free hosting with huge quotas, and most importantly easy scalability on Google’s infrastructure… Having to call your hosting company to add new servers is a pain in the ass (and in the wallet), having to create and delete instances on Amazon S3 is a much better, but not having to think about it at all is just pure joy.

Categories: hacking, tech Tags: , , ,

Chromium extensions on Linux?

May 14th, 2009 6 comments

I just compiled a recent build of Chromium (unbranded Google Chrome) from here. I would love to start playing with extensions, so I tried stuff from there.

Alas, it doesn’t work. Since a bunch of stuff is still missing from the Linux builds, I guess extensions don’t work either.

Did anyone manage to get extensions working on Chromium Linux? Is there a flag or something to add at compilation time?