UPDATE #2 (10-Oct 2010): Recently  there’s been a lot of talk over session hijacking, thanks to Firesheep and github. Dang. I liked the term fb-yelp-gibbed. Considerations below still apply.

UPDATE: After conversations with a friend, I made a few changes. Specifically, the fbuid is usable on your site, just don’t use it together with the JS library and don’t trust the browser.

User privacy is non-negotiable and developers should be as responsible as Facebook.

How to secure your FB Connect Implementation (so your users don’t get fb-yelp-gibbed):

OLD REST API

  1. DON’T use the JS library (violating this amplifies your users’ exposure; see EXCEPTION below)
  2. Push all FB connect requests through your backend
  3. DON’T STORE a userid or fbid in a cookie (only use fbuid client-side for externals; server should never trust browser-supplied fbuid)
  4. DON’T STORE your app’s FB API “secret” client-side (in javascript, in device app, etc.; NO EXCEPTIONS)
  5. DO store your user’s fbid and/or userid, only, on your server
  6. Never give client-side (JS, scripts, etc.) access to userid or fbid

When appropriate, verify the FB user is who they say they are by using auth.* methods, linked below; if you’re not sure what these do or what they’re for, give yourself 2-4 weeks to understand the ins and outs. OR, See OAuth comments below (and transition to OAuth).

http://developers.facebook.com/docs/reference/rest/auth.getSession

For iPhone/Android, learn how to proxy FB connect requests so you NEVER store your API “secret” on the phone.

The only communication between your users browser or device and your fb-app should be whether or not the user has been authenticated. Even then you should also utilize the rest/auth.* (server-side) methods to ensure the user actually authenticated.

NEW OAUTH API
Same as above. NEVER send API calls from JS in the browser! Read the authentication guide and understand every concept:

http://developers.facebook.com/docs/authentication/

EXCEPTION
The only exception here is if there’s ZERO user-generated content, ZERO 3rd-party HTML, ZERO 3rd-party JavaScript on a page, and everything the page and it’s assets are all sent via SSL. Even then, you’re at the mercy of the users desktop — don’t store userid, fbuid, or api secret anywhere on the client (in code, cookies, etc.)

The other exception here is if you really know what you’re doing and you’ve been dealing with XSS and browser authentication for a decade. In that case, I’m sure all of your application’s assets are served statically (or through SSL), your JS is locked down with a fine-tooth comb, you don’t let any advertisers or user-content sneak in HTML or JS, and you don’t store your FB API secret on the client.

WHY?
This is serious business. Privacy is priceless. Facebook Connect, despite how folks feel, is more secure than many banks. However, their crutch on letting developers do everything with JavaScript, and browsers limited support for security (injecting JS is like godmode in Doom), have put Facebook at the forefront of all of our security misgivings.

BUT WHAT ABOUT PRIVACY / PAI
A site with a significant user-base and an improper FB Connect implementation will, by proxy, give an attacker delegation to all of the private data that site has access to. Digg being hacked = digg FB users exploited, Yelp exploited = Yelp FB users screwed — you get the idea.

Please, don’t be that site. It’s easy to blame Facebook, but, all they’ve done is made public data public.

For no good reason I decided to get up to the Apple store at 2am this morning. Ok, so getting an iPad was a good idea, but 2am? I’m a night owl anyways.

In short, the iPad was good idea. However, if you already have an iPhone and take your laptop to sync up, consider these time-saving tips:

  1. The “Apple” iPad protector gets dirty very easily. Consider an alternative.
  2. Update iTunes by running software update; it’s about 100mb (hint: Apple Store’s have free wifi)
  3. When prompted, it’s somewhat* safe to click “Setup as new iPad” — you don’t have to restore your iPhone’s backup
  4. If you have more than 200 photos and want to start using your iPad “quickly”, don’t sync photos yet. They take a long time to be “optimized”. My sync (2500 photos) took more than an hour.
  5. You’ll be prompted to sync your current apps — a no brainer if you want your iPhone apps on the device

*I may end up performing a restore from my iPhone backup. Setting up the iPad as “new” resulted in none of my application settings and preferences being sync’d to the iPad. More on this after I test a few reloads.

Back to my little iPad debut story…

At 2:30a there were 4 cars, until 3am, when one person walked up to the store; thereafter everyone else stopped hiding and started camping at the Apple Store on Knox St. in Dallas. It seemed I was the 2nd arrival, but, the other party didn’t have a reservation so I ended up 1st for the “reservations” line. While the downside was an applicable “dork” label, I was interviewed by several local TV channels and newspapers. Ironically, I don’t watch local TV or read local newspapers.

On to the iPad… and how it rocks…

The first thing I noticed was orientation doesn’t matter — this is awesome because I can just pick it up and use it. The screen swiftly changes to any orientation and there’s a switch to lock it to a specific orientation.

I immediately installed iBooks, Pandora, and NPR — all free apps which looked great. Every app was fast and installed in seconds — almost before I had the chance to touch to start the app. Everything pops into place quickly. After syncing the iPad I quickly went through my routine apps and they were blazing fast and I had no issues.

The glossy screen, which suffers from glare, is still surprisingly legible in broad daylight. It seems as though the focal point of the glass surface is distant enough from the LCD that glares are overlooked. Several people were able to legibly record the screen as I installed various apps and navigated the iPad settings.

Safari is instant and quick as I tested out the various sites which Apple convinced to convert to html5video. Apple has them listed as “iPad-ready websites“. Videos were smooth, no clipping, and no interface issues. Did I say it was fast?

The keyboard was very usable, although it reminded me of my days as a Blackberry user. For some reason it was fairly natural to use my thumbs to type with the keyboard while holding the device in my hand. No problems typing so far.

Now I’m off to the App Store. I’m going to try not to spend more on apps than I paid for the device.

The big problem? No multitasking. If Apple gives the iPad multi-tasking, my “nesspad” is going to be quite the monster.

Minor, notable “little problems”:

  • Videos in iTunes don’t have an icon indicating a video and the video opens in a separate Videos application, which unlike iPod, will not play in the background.
  • as others have noted, screen smudges are ugly… but not really noticed when using the device
  • iPad apps are fairly “expensive”. I think Apple set the bar with their iWork apps and everyone is following suit. Time will tell whether the apps are worth their price.

It’s often easy to get your head caught up in the clouds when starting a company, building a new products, or ogling over that new cloud computing service. What do you consider when making your final decision?

  • aesthetics
  • cost
  • performance
  • connectivity

Did you consider analyzing every data point, of every option, for all of these considerations?

The possibilities are exponential and represent how many possible places you could find revenue you didn’t know existed. If you’re a successful freemium startup, the answer to this question is yes. At least has been for Dropbox, Evernote, Pandora, and WordPress. It’s great how Evernote has their user costs down to the penny.

Is every line, pixel, and connection within your software tracked for profitability?

So you come across a startup that’s pitching a “Real-Time” service, what do you do? Punch them in the face – now that’s real-time! Well, maybe that’s a bad idea, but you should completely tell them they’re not getting anywhere just by calling their service “Real-Time”. Here are some examples of concepts which can’t ever be real-time:

1. The News – Maybe if people as a whole were more intelligent, but, the closest you’re going to get is Digg or Reddit and those require thousands of data points (over time) before an article bubbles to the top of the relevancy list. The exception of course is “Bad News”, that could easily be done in real-time.

2. Product Pricing – Retailers have a hard enough time with loss prevention and maintaining profits than to care if their published prices and inventory are accurate or not. Sure, they have real-time inventory internally, but that’s a large enough dataset that it’ll never be replicated to a service provider; the short story is that you’ll never be able to get both an instant price and instant data at the same time.

Maybe if you were Google.

3. Search (sites, news, or otherwise) – Indexing is hard and there’s only one cat in the game with the facilities to do so in real-time. The only problem is the rest of the world doesn’t have a supercomputer running their system and there will always be a delay before Google gets the memo. The exceptions here are sites that Google cares about but chances are you’re not going to be big enough for that, else, you probably wouldn’t be a startup.

4. Communications – There’s already an “app for that”. It’s called the phone and your voice. Pickup phone, call friend, profit. Anything else might provide real-time delivery on one end or the other, but chances are, one person in the party is playing a video game, watching youtube, or chatting on Facebook in which case their response will be in Internet time.

Let me just steal the definition of Real-Time from Wikipedia:

In computer sciencereal-time computing (RTC), or “reactive computing”, is the study of hardware and software systems that are subject to a “real-time constraint”—i.e., operational deadlines from event to system response. By contrast, a non-real-time system is one for which there is no deadline, even if fast response or high performance is desired or preferred. The needs of real-time software are often addressed in the context of real-time operating systems, and synchronous programming languages, which provide frameworks on which to build real-time application software.

If that’s not your product then please, stop calling yourself real-time and get an old comp sci book, then figure out what real-time really is.

UPDATE: I’m very happy to update this post with a link to WordPress and their announcement of DNS editing. Kudos to WordPress for promptly providing this feature. Who knows, maybe they’ve been working on it awhile — but I was fairly convinced with my support conversation they weren’t working on the feature. Either way I’m happy with WordPress and the thought of never having to host my own WP blog again.

This is primarily with regard to WordPress.com, but, it’s an important and delicate concern for any service. Currently, if you want yourdomain.com to point to your WP blog, you have to give WP control over your domain name. WordPress has the challenging problem of providing scalable content and this is very hard to do while providing domain name support at the same time; this is why WP requires control of your domain’s DNS. WordPress has a highly redundant and robust network and part of that is having control over their user’s domains. This is limiting in that a user can’t point a subdomain like securepayments.yourdomain.com to another service.

While DNS natively supports this functionality, it’s not something the average blogger, hosting provider, or even application developer understands (it’s called Slaving). If/when WordPress supports this functionality, they’ll need to do so carefully to limit exposing their systems to outside risks, make it easy to use, and support the most complex users who need to run their own DNS.

If you provide a web service, such as a blogging platform, don’t build “domain name support” by forcing your users to give you full control over their domain name. In short, I don’t blame WP for their current implementation (it’s probably a lower priority for them), but here’s a few tips on what you should do if you want to make YOUR app/service support user-supplied domain names:

  1. Don’t give the feature away for free. Your support costs will be higher for this feature than most.
  2. Don’t give the feature away for free. DNS plays a big role in spam and you don’t want to associate your systems with a spammer’s domain name.
  3. Think very hard if you want to point a user’s app to the root of their domain. Doing so opens you up to the concerns above.
  4. If your service involves high-traffic content such that load balancers are involved, consult someone who’s figured this problem out before. (my advice is typically free)
  5. Whenever you’re building and scaling your services, keep all the above in mind. As the Internet becomes connected at the application layer, via things like federations and digital certificates, domain names will become more and more important as a component of security and authority.
  6. Let users host their own DNS and point it at you instead of the other way around. Administering DNS servers isn’t very cost effective and you’ll save money leaving it up to the users. The downside is you’ll need to be very good at providing your users with instructions and you’ll need to notify them when changes are necessary; again, Don’t give the feature away for free.
  7. Let users host their own DNS. You don’t want to host DNS for your users — doing so is a long-term cost commitment and something you can’t “undo”.

Amazon RDS: Poison or Pill

October 29, 2009

As soon as read the AWS newsletter about Amazon RDS, I started looking for a Megaphone to start shouting at folks – keep away! Amazon RDS or Relational Database Service places Amazon into the mire of shared hosting and AW users into a position of false confidence. Harsh words considering, overall, I feel Amazon’s service offerings are best-in-class. AWS offerings have historically pushed the envelope with regard to practical usage-based computing, something which ancient providers such as Sun and IBM have attempted to accomplish for decades; in this case I define practical as both usable and cost effective for small and large tasks. Up until now such systems weren’t trivialized to x86 hardware and required special programming considerations, access to academic institutions and/or a large budget. By combining SLA-supported x86 virtualization alongside application services such as S3, SQS, and SimpleDB, AWS has provided a usage-based on-demand computing solution which is simpler than task-based computing and as secure and reliable as virtualized or shared hosting. With it’s on-demand nature AWS is a cost effective for everything from small tasks to those requiring a datacenter of processors.

So why is Amazon RDS so bad, so much that you shouldn’t use it?

Well, there’s not an easy answer, the better question is to ask yourself: Why do you think AWS will be better than your own MySQL deployment? There is no right answer because almost any answer will probably, one day, bite you in the ass. Hard. I mean data loss, and it won’t be Amazon’s fault.

RDBMS systems and applications which depend on them are built from the ground up to rely on persistence, integrity, and static data models (schema). In contrast AWS has been built for distribution, decentralization, and the “cloud”. For Amazon, this service is somewhat of a U-turn from their original direction and has also placed a stamp on their forehead which says “That MySQL Guy” which is not good — I have nothing against mysql, however, as a de facto entry-level (free open source) software, it has accrued a strong following of immature software. Such software has nothing to do with the basic purposes of AWS or MySQL but has everything to do with how Amazon’s support and engineering staff will be spending their time which is supporting users and software which aren’t built for the cloud.

I hope that RDS won’t be a situation of butterflies & hurricanes but here’s a quick list of why the relative cost of RDS is high both for Amazon (the company) and all of it’s AWS users:

  • Cost for Amazon (operations, engineers, and products)
    • MySQL, like most open source systems, has been historically buggy software with a trailing release+testing+production schedule which requires continuous testing between production releases for large deployments (such as RDS).
    • MySQL has a large set of features which vary across releases and which share equal presence in production; in other words, Amazon will need to cater to providing production support for multiple versions, not just the latest stable version.
    • Amazon has no control over features and capabilities of MySQL and is thus limited to what MySQL provides; while MySQL provides many “good things”, Amazon will still be obligated to maintain through the bad. This is a shared disadvantage of AWS Map Reduce via Hadoop however, those are mostly mitigated because Map Reduce is such a low-level distributed system.
    • MySQL is very flexible and itself scales very well however it doesn’t do so by itself and requires a significant effort to be properly configured for the data being managed. All the folks who don’t know this will default into thinking Amazon will do this for them and will be disappointed when it doesn’t “just work”. Whether they ditch RDS or bug Amazon’s support, either way, it’s not a positive situation.
  • Cost for AWS (primarily EC2) users
    • Potential degradation of service and support for EC2 instances
      • With RDS available Amazon can defer issues with regard to running MySQL on EC2 instances to a recommendation for RDS — this will be a terrible waste of time for both parties.
      • MySQL is a very centralized system and by transitioning the decision of where MySQL resides in the AWS cloud from the user to Amazon, Amazon will be further centralizing the impact of MySQL on the cloud. Whereas users will randomly have MySQL deployed across any EC2 instance, Amazon will be appointing MySQL to specific hardware; this is based on the assumption that Amazon is clustering RDS deployments onto local hardware and not randomly deploying instances in the cloud. This is somewhat of a compromise for security and adds significant SLA risks (read: cost) to Amazon. In short, when a MySQL cluster dies – a LOT of folks are going to be VERY unhappy – their support tickets will be a burden to staff and their requests for credits will be a financial cost. Moreover, support staff will be yielding priority to these customers over other services because of the implicit severity.
    • Increased cost
      • RDS instances cost >10% more than regular instances and only come with the added benefit of backups — something which every system should already have in place. If you do choose to delegate the task of backups to RDS, you’re paying extra for a task you’ve already thought about doing yourself.
      • Cost of keeping your database, it’s backups, and it’s history all within AWS is multiplicative and if you grow to the point where you’re ready to move off you’ll be charged to transfer all the data to an external system. While this is a subjective cost it’s still worth pointing out; if folks aren’t already doing backups right, they’ll likely not know that cost effective database backups make use of binary logging facilities, not filesystem snapshots, and use significantly less disk space (and thus I/O).
    • False confidence
      • As I’ve mentioned before, letting other folks control your backups for you is a mistake. Failure is a matter of when, not if, and you’ll be in better control of responding if you understand what you’re dealing with. Just because RDS is doing you’re backups doesn’t mean you’re safe.
      • RDS users will expect MySQL to scale on-demand as everything else works that way with AWS and it’s just not that simple. Scaling a database requires analysis and a balanced combination of server settings, data normalization, and indexes; all of these things will still be the user’s responsibility and Amazon’s solution of “throw hardware at it” is a haunted path to send it’s users down.

Overall, I feel that Amazon could quickly cannibalize the value and quality of AWS if they (continue to) introduce trivial services. Supporting open source software they have no control over is a significant increase in relative support and operations cost. Amazon seems to be approaching this by making the cost of RDS instances more than EC2 which is a mistake because the real cost is the lost opportunity of engineers spending their time on systems which are more efficient for cloud computing – Amazon could charge 3 times an EC2 instance and their engineers would still be better off building technologies for cloud-based systems and not centralized RDBMS-dependent web applications.

Where I feel Amazon has fallen short the most, is that RDS only provides single-instance MySQL support and nothing more. No load balancing, replication, Hadoop integration, or any other form of data abstraction which could make it functional in a cloud computing context. Not implementing these features is a very clear indicator that AWS is focused more on short term revenue generating feature rather than cost effective cloud computing systems or improving the shortfalls of legacy centralized system.

With all this said, I have to consider the possibility of this being a good move for Amazon. I present the potential issues with RDS simply to warn folks from relying on it as a crutch, and, to point out the new direction AWS has veered is into choppy waters. There are several aspects of RDS which will give Amazon insight into correlations among and between the varying systems of data storage and processing – comparing SimpleDB, MapReduce, MySQL, and general resource consumption could shed light onto how their cloud is being used at a higher level than processors and bandwidth. Last, Amazon might be aware that MySQL is a crutch and is putting the service out there as a way to wean folks off of centralized systems.

I’ve perused several posts about handling cookies when multiple subdomains are involved however, the solutions were either for older versions of rails or didn’t resolve my situation; we wanted to have a cookie which could be used among all subdomains. This might also give you some insight as to why restful-authentication doesn’t have a feature to do all this for you — it keeps changing and by-hand is best for now. If you’re employing this, do be diligent with security; sharing credentials across domains can be risky business if your security varies across domains.

To do this, first edit config/initializers/session_store.rb where you’ll want to add the key:

:domain => ‘.example.com’

The format here is important – if you don’t prefix the domain with a period the cookie (and session) will not apply for requests to subdomains. This covers the rails session — however we also need to cover the cookie set by restful-authentication which you’ll find in lib/authenticated_system.rb. In the kill_remember_cookie! and send_remember_cookie! methods insert same key as above or a reference to the session_options key. It’ll look like this:

def kill_remember_cookie!
  cookies.delete :auth_token, :domain => ActionController::Base.session_options[:domain]
end
def send_remember_cookie!
  cookies[:auth_token] = {
    :value   => @current_user.remember_token,
    :expires => @current_user.remember_token_expires_at,
    :domain => ActionController::Base.session_options[:domain] }
end

During development you should be aware this might not work using ‘localhost’, depending on your OS. The best thing to do is to edit your hosts file to have “example.local” point to your machine and use those domains for testing instead.

If you’re doing anything more complicated, you’ve got your work cut out for you as you may need to write custom rack middleware (see: Google) and/or use a Proc. In the latest Rails, cookies are being handled by Rack (instead of CGI); in any version, setting Cookies via cookies[:key]= is performed independent of the session options which is why you must specify the domain separately. There are some folks who describe monkey patching Rails to set the domain automatically but this is unreliable as I believe it’s changed every release. If you don’t want to have to change it, just create a wrapper method for setting your cookies, or, set the domain wherever you set or delete a cookie. We only set one cookie via restful-authentication so 2 lines is a fairly simple fix.

Follow

Get every new post delivered to your Inbox.