Jeremy Wright: “Listen up. If your company relies on the web to stay alive, you’d damn well better be using at least some of the following “ladder to high availability”:
Backups, Redundant, Failover, Cluster, Distributed, Grid and finally Mesh.”
I actually tend to agree with most comments to Jeremy’s post: It’s not so important for Web 2.0 startups to really scale so well, at the beginning.
However, I got bitten today by a particularly nasty failure on part of Trenitalia (sorry, no link here; I don’t want their scalability problems to become even worse), the Italian national railways company. I have been trying all afternoon to reserve a seat on a train to Rome, but their online reservation system is totally non-functioning. I’ve phoned their call center and they said their own terminals are stuck too. They even told me it’s been going like this since yesterday.
I tried going to the station, but the kiosks there display a large, red, “Out of order” message. There was a lot of people standing in queue at the manned counters. Apparently the terminals there are still functioning, or maybe they are distributing hand-written tickets, like in the days of yore.
I was planning to go to Rome next monday, but there’s a strike on that day. The train I was planning to take leaves one hour before the beginning of the strike, and the call center operator told me that he would have been able to reserve me a seat, strike notwithstanding, if only his terminal had worked.
Given that most trains will be suppressed on monday, the planes are of course all sold out. In the end, I thought it safer to leave on tuesday, so I cancelled one night at the hotel, but still I haven’t been able to reserve a train seat for tuesday.
Now, I don’t know whether this is a scalability problem or some kind of catastrophic failure, but given the reliance we’re starting to put into being able to conduct most of our businesses online, this is scary.