Tuesday, February 28, 2012

PHP libxml issue while compiling

Today I had some trouble getting libxml to work in PHP. I was compiling PHP 5.3.10 for a client's Joomla site and it needs libxml. The issue seemed to be commonly happening to many people and I came across a number of forum threads on the topic.

The suggestions mostly hover around installing libxml2 development package. This can be done in Debian or Ubuntu by:
apt-get install libxml2 libxml2-dev

On other Linux distributions you may try libxml2-devel

But I had already done that and still libxml was not showing up in PHP. I had the --with-libxml-dir setting in ./configure with no luck.

Then I came across the --enable-libxml setting, which although is not mentioned directly in the `./configure --help`, it should have come to my mind. This is how you enable (or disable with --disable) any module from the ext/ directory. Anyway, I found it mentioned here and that worked! The reason for this is because I had --disable-all option set. So all default modules were off, and I had to enable libxml with its enable option.

Hope this helps someone else...

Friday, February 17, 2012

Building a Yii app: The Data Model

A friend of mine needs a web application to be revamped. I had originally created it about 2 years back using a custom PHP mini-framework that I had built for many projects at that time. The development had stopped for different reasons and parts of his web application were incomplete. For example images were not upload to Amazon S3, which was originally planned. Some Model edits were not working in many parts and there were some data validation issue. He has been doing the groundwork for his business and now has decided to finally complete the application.

The application is related to Medical needs, information about patients, doctors, hospitals etc. I have decided to make the new version using Yii, since some other developer will take over if the project is successful and Yii (or other popular frameworks) is a very well documented framework for anyone to use. The choice of Yii against other PHP frameworks is rather just an impulsive one. I have read comparisons of the good PHP frameworks and Yii is among the top few. Anyway moving on...

While I am building this application, which is a moderately feature rich one, I intend to write about my experiences. Hope this helps anyone looking for a quick introduction and example for Yii. I am trying to make this a tutorial for Yii, I will try my best here. The project needs data for multiple types:
  • General User Profile (could be a patient)
  • Doctor
  • Hospital
  • Nursing Home
  • Other Medical companies like: Medical Shop, Diagnostic Center, Fitness Center, Ambulance Provider, Nurse, etc.

Other information include:
  • Specialization: this is related to Doctors, explained below
  • Address: City, State, Country, etc. Any entity can have multiple addresses
  • Phone: can be either a fixed or a mobile (cellular) phone. Any entity can have multiple phones, also address can have phones associated with them
  • Department: Hospitals or Nursing Homes can specify many departments
  • Branch: Hospitals, Nursing Homes of Other companies can have multiple locations/branches.
  • Image: multiple images for any entity.
  • User: this is used for authentication, simple email/password for now.
In order to manage the mappings of many entity types to Address, Phone, Branch, Image etc. I have used a central Entity table. Every type of physical entity has an entry in the Entity table (including each branch). Then Address, Phone, Branch, Image, Department are mapped to Entity table.

In Yii terms the relations look like this:
  • Doctors can have one or more Specialization (HAS_MANY in Yii Model)
  • Doctor, Hospital, Nursing Home, Other, Profile have a one-to-one mapping to Entity (BELONGS_TO in Yii)
  • Entity can have multiple Addresses, Phones, Images (HAS_MANY in Yii)
  • Entity can have many Departments (HAS_MANY). This is limited to only Hospital, Nursing Home or Other types though at the application level.
  • Entity can have another Entity as Branch (HS_MANY). This is also limited to Hospital, Nursing Home or Other types at the application level.
I will write about setting up Yii Models in the next blog entry...

Saturday, February 04, 2012

Trying out non AWS options

I have been using Amazon Web Services for all of my (or companies' I work with) compute or storage needs for the last 4 years. That include AWS S3, EC2, SimpleDB and even RDS (at MobStac). For the last few weeks I have been planning the platform choices for Mag. It does include AWS DynamoDB, S3 and EC2, but the picture is a bit different.

Amazon DynamoDB guarantees a lot of performance and I personally do not want to take database headaches. DynamoDB pricing model is great to start with too. It is not cheap, but does not bite at the same time. And you continue paying as you grow. The data model is similar to other NoSQL services and you can shift out later if you want.

AWS S3 is a really reliable, cheap storage solution and there is no doubt we will use it wherever we need. EC2 is great for its scalability or its powerful solution to failure handling. Bringing up a pre-configured EC2 (many data centers around the world) is fast, easy and cost per hour is cheap.  Also extra processing as and when needed is perfect fit for EC2 Spot Instances.

While AWS EC2 provides all these great services, hosting regular websites and serving traffic is still costlier. Bandwidth is premium inside AWS. Yes they are very well connected, but you can get that from many top quality hosting providers/data centers. Also the machines themselves are not as fast as a good VPS could be. I have been searching for VPS providers with SSD storage, newer processors and RAM. Recently I tested a tiny VPS (384MB RAM) with such configuration and the Apache Benchmark shows better performance than an AWS EC2 instance (Ubuntu, nginx, PHP, APC, apc.stat=0). Although PHP on the VPS was compiled, and I will re-run the tests, but the difference was very visible.

Of course I do not intend to co-locate or manage any hardware. But I certainly want to look at good VPS or unmanaged dedicated offerings. Configuring a VPS/server is not easy too, from firewall to internal network if you run a cluster, but if the pricing differences are worth it, then it makes business sense. And EC2 is never out of the picture. They will be used in parallel but for stuff I mentioned above and not for regular traffic.

Monday, January 23, 2012

Trying out GeoDNS

For the last 3 weeks that we have started working on Mag, we have been talking to a few prospective clients or other knowledgeable folks out there. We want to understand how individuals, SMBs or larger organizations use Social Media or the Internet in general. Of course all of these initial meets are local to Bangalore and a few in other cities in India (over the phone).

We have been getting a very positive response and although we are building a truly global product we want to first cater to people nearby, learn and expand gradually. Thus it becomes important that we serve the Indian Internet audience. Now servers hosted in the US have high latency from India (~300 ms). Of course we host and will continue to host core business data on Amazon Web Services (we have selected DynamoDB), but an Indian serving facility needs to exist (I will talk about serving details later). So one of the first things that popped up in my mind is that this should not be India specific architecture. Not an in.mag.io domain and such stuff. We should have a simple strategy that can scale in any region. Thus enter GeoDNS.

GeoDNS or Geographical DNS is basically responding to user DNS queries with different responses, depending on the origin of the user request. So if I have a server, say per continent, then any user from any continent should be served from the server in that continent. The concept is simple to understand although I am sure its not an easy engineering task. This is where Zerigo helps out. We have used their DNS plans earlier (although Free plan), but GeoDNS is available only on DNS Pro plan. So I quickly upgraded and got a taste of the administration and a demo setup. The setup is easy: you just mention that a particular domain (or sub domain) is under GeoDNS. Then you assign actual server IP addresses per region with a special sub-domain starting with underscore (_).

_asi.test.mag.io <some.ip.address>
_def.test.mag.io <some.ip.address>

The _def is default, and _asi is for Asia. If you query from Asia, you should that the _asi IP address, else the default. The test worked perfectly fine. You can use a country code too (ISO two letter code) and USA is further divided in 4 regions. Very nice for traffic balancing by region. If you have similar needs, go ahead and try this :)

Friday, January 20, 2012

Schema Example on Amazon DynamoDB

First an apology to anyone who read my previous blog post. I had used wrong rates for throughput capacity. I noticed this since my calculations seemed a little costly :)

Throughput Capacity price as on AWS website, as of today:

* Write Throughput: $0.01 per hour for every 10 units of Write Capacity
* Read Throughput: $0.01 per hour for every 50 units of Read Capacity

Thus as per my example of 7 tables with minimum 5 Read and 5 Write throughput capacity per table, hourly pricing is:
35 * $0.001 = $0.035 (Write)
35 * $0.0002 = $0.007 (Read)
Total hourly cost = $0.042, monthly cost = $30.24

My current database design is using 3 tables. One is for authentication. We will allow users to signup/login using either email+password (MD5) or using FB Connect or similar. In both cases we will store these authentication string(s) in an "Auth" table. The Primary Key will be the auth strings themselves, either concat of email+password or OAuth tokens. The items will have another attribute which will be `UserName`.  I am yet to read FB Connect documentation, so this may need slight re-configuration.

The `UserName` is a PK in the second table which is "User". The items contain common user attributes like `FirstName`, `LastName`, `Email` and many such needed data. The last table is the "Campaign" table where we store campaign configure information for Mag. This has a Hash and Range type Primary Key. The Hash part is the `UserName`, and the Range is the `CampaignId` (unique string generated from the title of the Campaign). Each user can run multiple Campaigns. What I am not sure of is what is the performance hit if all users mostly have only 1 campaign. In that case I think it will be better to store Campaigns as a Set in the "User" table and the "Campaign" table could have just a simple Hash PK named `CampaignId`.

My doubts as of now are what values to set for throughput. I neither want to overcharge myself, nor do I want AWS to throttle the connection. I guess a real example will shine more light. Back to coding now!

Thursday, January 19, 2012

Day One of Amazon DynamoDB

Its been just about a day that I have been going through the documentation of the newly released Amazon DynamoDB. Using the PHP SDK and getting a basic Table up was simple. Right now I am planning the data organization. DynamoDB is a schema less data store. There is a primary key per table which is your main query column, so as to say. I am trying to use existing MongoDB based data mappers and modify them for DynamoDB.

I am planning to use Fat-Free Framework for the back-end. This is basically the management panel's API. All user interface will be JavaScript only, within the browsers. Templates will probably be Mustache. Anyways, back to DynamoDB: once I get a data mapper done, I will release it on GitHub. I am also looking for similar stuff in Python and Node. I am sure in a couple of weeks we will see them pouring over the Internet.

I have been doing some pricing calculations. This is important for us since we have a tiny budget to start with. The price of AWS DynamoDB depends on two main things: size of data store and the throughput needed. The size is charged by the GB/month. So a decent 5 GB data store will cost $5/month. But remember there is metadata that AWS stores and you will be charged for that. I will get a better overall idea of the numbers involved gradually but for starters here is the note from AWS site:

"Amazon DynamoDB is an indexed datastore, and the amount of disk space your data consumes will exceed the raw size of the data you have uploaded. Amazon DynamoDB measures the size of your billable data by adding up the raw byte size of the data you upload, plus a per-item storage overhead of 100 bytes to account for indexing."

The other cost involved is the throughput. This is total reads/writes per second you will need, across all your tables. The minimum throughput allowed per table is 5 each (5 read and 5 write). That means that if you initially have say 7 tables, then your minimum throughput is (7 * 5) + (7 * 5) = 70.
Price for read throughput is $0.01/hour for 10 units.
And price for write throughput is $0.01/hour for 5 units.

Thus your total hourly price comes to $0.07 + $0.035 = $0.105/hour.
Assuming 720 hours in a month, your monthly cost for throughput is $75.6.

This pricing is a little steep. Which is itching me to go through the schema design and see what our starting costs are. I had initially not taken the minimum throughput per table into account; which was yesterday. Anyway, I will need a complete table design to judge anything. But one thing is clear: table count is costly if your tables are basic and do not need even 5 writes/reads per second.

Edit: I had calculated the sum to be $0.15 / hour. I am still not out of bed I guess :)

Mag will use Amazon DynamoDB

Amazon's latest announcement (and offering) could not have come at a better time for Mag. I am talking about Amazon DynamoDB here and I feel it is a good fit for Mag.

For the last few weeks I have been looking at many available options for the data storage for Mag. Notice I mentioned data store and not RDBMS. The reason is that Mag's data is really a large collection of configurations for campaigns that our clients run. Yes they may grow in size as we add customers, but the schema is quite simple, at least for now. And I would prefer having a hosted, cost effective service that is fast. I want to concentrate on the application because that is the real power of Mag.

My current choices included the NoSQL systems like MongoDB, Couch, Redis or similar, but honestly I would not want to handle managing the storage myself. And replication, and other stuff. I had made plans to do that though, but now I will be free from them. Plus the move to SSD seems very practical and needed. I am not saying SSDs will help Mag right now, but from the industry standpoint this is a good start.

But what has really made my decision easy is that it is no hassle, no management and it scales when needed. AWS also claims really low latency, very high throughput. Although our reads/writes will be tiny to start with. I will see the latency for myself once I start but I believe their claims for now. From a startup point of view the pricing is good too. Initially free to start with, although its a tiny 100MB database, but its good to experiment stuff. I used basic assumptions to check prices and it seems good to me.

I will see what existing libraries I can re-use to make the transition smooth. Our applications are JavaScript heavy, the back-end (PHP/Python/Node mix) is just an API as such. All rendering done right on the browser. Just JSONs passing here and there. That's it then, lets code!

Monday, January 16, 2012

Price comparison of dedicated or VPS hosts

For the impatient: link to spreadsheet.

For the last few weeks I have been looking around a lot on the web, checking for details and prices of VPS or dedicated servers. I am looking for places to host Mag when it starts growing. Not that I am leaving AWS in any way. But for Mag we will need servers at a lower bandwidth cost that on AWS.

I am looking for self managed (or unmanaged) dedicated/VPS solutions. I will prefer dedicated types. I am looking for a decent Dual Core commercial processor (Intel Core2Duo or AMD Athlons will do), but higher grade Xeon or Opteron types are also on my mind.

So I am maintaining a spreadsheet will selected servers from some hosting companies that I keep adding to every week. You can access the spreadsheet here. Its publicly view-able. If you want me to add a company you know of, please let me know. If you want edit permissions, I could give you that too. Servers could be located in any major region around the world like North America, South America, UK, Germany, Russia, Japan, Australia, Middle East, India, or anywhere there is a substantial user volume.

Friday, January 13, 2012

Height of Facebook custom Page Tabs

This is a quick tip: I am currently working on a Facebook app. I have a demo for a custom Page Tab. But somehow the height was not setting properly. The canvas height setting in the Developer setting was to Fluid. I have no clue where was the setting for Page Tab. Anyway after a little search I found that the following code works:

<div id="fb-root"></div>
<script type="text/javascript" src="https://connect.facebook.net/en_US/all.js"></script>
<script type="text/javascript" charset="utf-8">
FB.Canvas.setSize({height: 1500});

I placed it at the bottom of my Application's custom Page Tab, before the end body tag. The height value is according to your needs. I am yet to look how this would work with flexible height (determined after iframe renders). Will update that later, for today this will do.

Serve static files from nginx for POST request

This is a quick tip, I haven't had the time to dig deep into this. I am working on a Facebook app, and was doing a demo with static HTML. Everything was set, including SSL (I will write quick tip on that too). I could browse to the page separately. But it failed from within Facebook, giving a 405 error.

From previous experience I remembered Facebook does POST requests, but I had forgotten the fix for static files. For some reason nginx does not allow serving static files for HTTP POST request. Anyway the fix was:

    error_page 405 = $uri;

Also here is another suggestion: Serving Static Content Via POST From Nginx

Hope this helps you.

Thursday, January 05, 2012

Oh! So we have started up

Today is Friday, January 6, 2012 (Indian Time). Last week I was at MobStac at this time. That was my last day at MobStac. 2012 started with fun, food, a short drive to a nearby spot, drinks and everything in-between. But the fun is gradually getting over and the feeling has started to change. I have done this a couple times earlier. I have heard the rules of the game repeated so many times by others. I have seen successful entrepreneurs, and seen teams grow. I am not new to this world. But that guarantees nothing.

Me and my wife, Debleena have been planning to do this for the last 4-5 months. We have saved up enough to last for about 10 months. Plus we do have some small projects to help us earn some extra cash. This is bare survival and that is great. We have a couple of ideas that we want to experiment with. We did not get through YCombinator, and The Morpheus was not a suitable match, given that we (including our part time members) had a lot of experience in the stuff we would do. But that does not mean we are at a loss.

We have knowledge, experience and a shitload of spirit. We have Amit and Ipsita to help us part-time. And we have great mentors like Pallav, Sharad, Shubhodeep, Ravi and Sharat. And not to forgot our dear friends Ruhita, Arijit, ParoNitesh, Mohan and so many others I can not name here right now. Heck! what do we need to worry about?

I have seen this world of startups for last 6+ years. Worked to run meetups in Kolkata for a couple years. But overall I have never been successful by industry standards. And that is great. I have scope to better my last time's plan then. Its an open ground, we are all players. So yeah we are going to rock, no doubt. Good luck to ourselves!

Edit: I missed Paro. I am sure I missed a few other names.