Monday, January 23, 2012

Trying out GeoDNS

For the last 3 weeks that we have started working on Mag, we have been talking to a few prospective clients or other knowledgeable folks out there. We want to understand how individuals, SMBs or larger organizations use Social Media or the Internet in general. Of course all of these initial meets are local to Bangalore and a few in other cities in India (over the phone).

We have been getting a very positive response and although we are building a truly global product we want to first cater to people nearby, learn and expand gradually. Thus it becomes important that we serve the Indian Internet audience. Now servers hosted in the US have high latency from India (~300 ms). Of course we host and will continue to host core business data on Amazon Web Services (we have selected DynamoDB), but an Indian serving facility needs to exist (I will talk about serving details later). So one of the first things that popped up in my mind is that this should not be India specific architecture. Not an in.mag.io domain and such stuff. We should have a simple strategy that can scale in any region. Thus enter GeoDNS.

GeoDNS or Geographical DNS is basically responding to user DNS queries with different responses, depending on the origin of the user request. So if I have a server, say per continent, then any user from any continent should be served from the server in that continent. The concept is simple to understand although I am sure its not an easy engineering task. This is where Zerigo helps out. We have used their DNS plans earlier (although Free plan), but GeoDNS is available only on DNS Pro plan. So I quickly upgraded and got a taste of the administration and a demo setup. The setup is easy: you just mention that a particular domain (or sub domain) is under GeoDNS. Then you assign actual server IP addresses per region with a special sub-domain starting with underscore (_).

Example:
_asi.test.mag.io <some.ip.address>
_def.test.mag.io <some.ip.address>

The _def is default, and _asi is for Asia. If you query from Asia, you should that the _asi IP address, else the default. The test worked perfectly fine. You can use a country code too (ISO two letter code) and USA is further divided in 4 regions. Very nice for traffic balancing by region. If you have similar needs, go ahead and try this :)

Friday, January 20, 2012

Schema Example on Amazon DynamoDB

First an apology to anyone who read my previous blog post. I had used wrong rates for throughput capacity. I noticed this since my calculations seemed a little costly :)

Throughput Capacity price as on AWS website, as of today:

* Write Throughput: $0.01 per hour for every 10 units of Write Capacity
* Read Throughput: $0.01 per hour for every 50 units of Read Capacity

Thus as per my example of 7 tables with minimum 5 Read and 5 Write throughput capacity per table, hourly pricing is:
35 * $0.001 = $0.035 (Write)
35 * $0.0002 = $0.007 (Read)
Total hourly cost = $0.042, monthly cost = $30.24

My current database design is using 3 tables. One is for authentication. We will allow users to signup/login using either email+password (MD5) or using FB Connect or similar. In both cases we will store these authentication string(s) in an "Auth" table. The Primary Key will be the auth strings themselves, either concat of email+password or OAuth tokens. The items will have another attribute which will be `UserName`.  I am yet to read FB Connect documentation, so this may need slight re-configuration.

The `UserName` is a PK in the second table which is "User". The items contain common user attributes like `FirstName`, `LastName`, `Email` and many such needed data. The last table is the "Campaign" table where we store campaign configure information for Mag. This has a Hash and Range type Primary Key. The Hash part is the `UserName`, and the Range is the `CampaignId` (unique string generated from the title of the Campaign). Each user can run multiple Campaigns. What I am not sure of is what is the performance hit if all users mostly have only 1 campaign. In that case I think it will be better to store Campaigns as a Set in the "User" table and the "Campaign" table could have just a simple Hash PK named `CampaignId`.

My doubts as of now are what values to set for throughput. I neither want to overcharge myself, nor do I want AWS to throttle the connection. I guess a real example will shine more light. Back to coding now!

Thursday, January 19, 2012

Day One of Amazon DynamoDB

Its been just about a day that I have been going through the documentation of the newly released Amazon DynamoDB. Using the PHP SDK and getting a basic Table up was simple. Right now I am planning the data organization. DynamoDB is a schema less data store. There is a primary key per table which is your main query column, so as to say. I am trying to use existing MongoDB based data mappers and modify them for DynamoDB.

I am planning to use Fat-Free Framework for the back-end. This is basically the management panel's API. All user interface will be JavaScript only, within the browsers. Templates will probably be Mustache. Anyways, back to DynamoDB: once I get a data mapper done, I will release it on GitHub. I am also looking for similar stuff in Python and Node. I am sure in a couple of weeks we will see them pouring over the Internet.

Pricing
I have been doing some pricing calculations. This is important for us since we have a tiny budget to start with. The price of AWS DynamoDB depends on two main things: size of data store and the throughput needed. The size is charged by the GB/month. So a decent 5 GB data store will cost $5/month. But remember there is metadata that AWS stores and you will be charged for that. I will get a better overall idea of the numbers involved gradually but for starters here is the note from AWS site:

"Amazon DynamoDB is an indexed datastore, and the amount of disk space your data consumes will exceed the raw size of the data you have uploaded. Amazon DynamoDB measures the size of your billable data by adding up the raw byte size of the data you upload, plus a per-item storage overhead of 100 bytes to account for indexing."

The other cost involved is the throughput. This is total reads/writes per second you will need, across all your tables. The minimum throughput allowed per table is 5 each (5 read and 5 write). That means that if you initially have say 7 tables, then your minimum throughput is (7 * 5) + (7 * 5) = 70.
Price for read throughput is $0.01/hour for 10 units.
And price for write throughput is $0.01/hour for 5 units.

Thus your total hourly price comes to $0.07 + $0.035 = $0.105/hour.
Assuming 720 hours in a month, your monthly cost for throughput is $75.6.

This pricing is a little steep. Which is itching me to go through the schema design and see what our starting costs are. I had initially not taken the minimum throughput per table into account; which was yesterday. Anyway, I will need a complete table design to judge anything. But one thing is clear: table count is costly if your tables are basic and do not need even 5 writes/reads per second.

Edit: I had calculated the sum to be $0.15 / hour. I am still not out of bed I guess :)

Mag will use Amazon DynamoDB

Amazon's latest announcement (and offering) could not have come at a better time for Mag. I am talking about Amazon DynamoDB here and I feel it is a good fit for Mag.

For the last few weeks I have been looking at many available options for the data storage for Mag. Notice I mentioned data store and not RDBMS. The reason is that Mag's data is really a large collection of configurations for campaigns that our clients run. Yes they may grow in size as we add customers, but the schema is quite simple, at least for now. And I would prefer having a hosted, cost effective service that is fast. I want to concentrate on the application because that is the real power of Mag.

My current choices included the NoSQL systems like MongoDB, Couch, Redis or similar, but honestly I would not want to handle managing the storage myself. And replication, and other stuff. I had made plans to do that though, but now I will be free from them. Plus the move to SSD seems very practical and needed. I am not saying SSDs will help Mag right now, but from the industry standpoint this is a good start.

But what has really made my decision easy is that it is no hassle, no management and it scales when needed. AWS also claims really low latency, very high throughput. Although our reads/writes will be tiny to start with. I will see the latency for myself once I start but I believe their claims for now. From a startup point of view the pricing is good too. Initially free to start with, although its a tiny 100MB database, but its good to experiment stuff. I used basic assumptions to check prices and it seems good to me.

I will see what existing libraries I can re-use to make the transition smooth. Our applications are JavaScript heavy, the back-end (PHP/Python/Node mix) is just an API as such. All rendering done right on the browser. Just JSONs passing here and there. That's it then, lets code!

Monday, January 16, 2012

Price comparison of dedicated or VPS hosts

For the impatient: link to spreadsheet.

For the last few weeks I have been looking around a lot on the web, checking for details and prices of VPS or dedicated servers. I am looking for places to host Mag when it starts growing. Not that I am leaving AWS in any way. But for Mag we will need servers at a lower bandwidth cost that on AWS.

I am looking for self managed (or unmanaged) dedicated/VPS solutions. I will prefer dedicated types. I am looking for a decent Dual Core commercial processor (Intel Core2Duo or AMD Athlons will do), but higher grade Xeon or Opteron types are also on my mind.

So I am maintaining a spreadsheet will selected servers from some hosting companies that I keep adding to every week. You can access the spreadsheet here. Its publicly view-able. If you want me to add a company you know of, please let me know. If you want edit permissions, I could give you that too. Servers could be located in any major region around the world like North America, South America, UK, Germany, Russia, Japan, Australia, Middle East, India, or anywhere there is a substantial user volume.

Friday, January 13, 2012

Height of Facebook custom Page Tabs

This is a quick tip: I am currently working on a Facebook app. I have a demo for a custom Page Tab. But somehow the height was not setting properly. The canvas height setting in the Developer setting was to Fluid. I have no clue where was the setting for Page Tab. Anyway after a little search I found that the following code works:


<div id="fb-root"></div>
<script type="text/javascript" src="https://connect.facebook.net/en_US/all.js"></script>
<script type="text/javascript" charset="utf-8">
FB.Canvas.setSize({height: 1500});
</script>

I placed it at the bottom of my Application's custom Page Tab, before the end body tag. The height value is according to your needs. I am yet to look how this would work with flexible height (determined after iframe renders). Will update that later, for today this will do.

Serve static files from nginx for POST request

This is a quick tip, I haven't had the time to dig deep into this. I am working on a Facebook app, and was doing a demo with static HTML. Everything was set, including SSL (I will write quick tip on that too). I could browse to the page separately. But it failed from within Facebook, giving a 405 error.

From previous experience I remembered Facebook does POST requests, but I had forgotten the fix for static files. For some reason nginx does not allow serving static files for HTTP POST request. Anyway the fix was:

    error_page 405 = $uri;


Also here is another suggestion: Serving Static Content Via POST From Nginx

Hope this helps you.

Thursday, January 05, 2012

Oh! So we have started up

Today is Friday, January 6, 2012 (Indian Time). Last week I was at MobStac at this time. That was my last day at MobStac. 2012 started with fun, food, a short drive to a nearby spot, drinks and everything in-between. But the fun is gradually getting over and the feeling has started to change. I have done this a couple times earlier. I have heard the rules of the game repeated so many times by others. I have seen successful entrepreneurs, and seen teams grow. I am not new to this world. But that guarantees nothing.

Me and my wife, Debleena have been planning to do this for the last 4-5 months. We have saved up enough to last for about 10 months. Plus we do have some small projects to help us earn some extra cash. This is bare survival and that is great. We have a couple of ideas that we want to experiment with. We did not get through YCombinator, and The Morpheus was not a suitable match, given that we (including our part time members) had a lot of experience in the stuff we would do. But that does not mean we are at a loss.

We have knowledge, experience and a shitload of spirit. We have Amit and Ipsita to help us part-time. And we have great mentors like Pallav, Sharad, Shubhodeep, Ravi and Sharat. And not to forgot our dear friends Ruhita, Arijit, ParoNitesh, Mohan and so many others I can not name here right now. Heck! what do we need to worry about?

I have seen this world of startups for last 6+ years. Worked to run meetups in Kolkata for a couple years. But overall I have never been successful by industry standards. And that is great. I have scope to better my last time's plan then. Its an open ground, we are all players. So yeah we are going to rock, no doubt. Good luck to ourselves!

Edit: I missed Paro. I am sure I missed a few other names.