I’ve just spend the week at the Google office in London to attend the CP300 course. I thought a good way to prepare for the certification exams would be to put down my notes on my blog…
Thanks to Ignacio who was leading the training.
The first two days were dedicated to Google App Engine.
App Engine is all about building scalable, reliable and cost effective web application the Google way, it :
- Leverages Google CDN to serve static ressources
- Use Stateless application server with automatic horizontal scaling
- Use a NoSql datastore (you can also connect it to a relationnal database, cloud SQL, if needed)
You can configure the way it scales by tweaking pending latency and the number of idle instances. This will impact the performance and the cost of your application.
Instance on Appengine can stop and start frequently, this means you should avoid framework with long start-up time such as Spring or JPA. For depency injection prefer Guice or Dagger (injection is done at compile time)
The App Engine console let you monitor quotas usages, very important, most errors on App Engine append because of quota limitations. Of course you can pay to remove those limits. You set a maximum daily budget to make sure you won’t suffer from a denied of service attack on your credit card !
You can deploy and run in parallel multiple versions of the same app (blue/green deployment out of the box)
The app stat tool let you analyze performances.
Authentication & Authorization
GAE provides a service to handle Authentication & Authorization for you. It will use Google account or an openId provider. You can also integrate GAE with an enterprise SSO solution but it requires a Google Apps for business account.
Authorization to access other google API (calendar, storage, compute, …) is done with OAuth2.0.
You can try service calls and Oauth2.0 in the playground
This is the heart of App Engine, you better understand this if you wan’t your application to run well on App Engine.
The GAE datastore is based on Google BigTable, it provides strong consistency for single row but eventual consistency for multi row level.
Every row contains an entity of a certain kind. An entity has a key and properties, properties can be multi-valued.
An entity can have a parent to form an entity group (a single entity without parent count as an entity group). Entity group are usefull to force strong consistency when writing data.
Data on bigtable is distributed by key, if you specify the key yourself make sure it is random enough to get a good distribution of content on the underlining hardware and better performance.
The DataStore is optimized for read queries. Datastore always use an index to read data. All indexes are sorted and distributed on multiple machines.
Queries on the datastore are executed as index scan on bigtable => it’s very fast (the query performance scale with the size of the result not the size of the dataset) but it comes with a few limits:
– You can’t query without an index (indexed can be automaticaly created, beware of their size)
– Queries on multi-valued properties can lead to combinatorial Explosion and big indexes
– Missing properties is not equal to Null/none
– Inequality filter (!=) are limited to one property per query (this is because it is implemented as x< AND x> to use one sorted index)
– no JOIN (use denormalization)
– no aggregation queries (Group by, sum, having, avg, max, min, …) (instead use special entities that maintains counts) see sharding counter pattern
– creating a new index on large set can be long
Indexes are not immediatly updated when writing but ancestor queries force the index update to complete to get strong consistency.
For transaction the datastore use snapshot isolation and optimistic concurrency
Transcation can’t affect more than 5 entity groups
Can’t make more than 5 updates per second to an entity group
A transaction can’t take more than 60 seconds
Memcache, TaskQueue, Cron
GAE provides Memcache as a service to improve performance and reduce application cost. A memcache query can be ten times faster than a datastore query. Memcache can be used as a read/Write cache to the datastore
GAE provides a taskQueue service to executed asynchronous work :
- push queues are managed by App Engine
- pull queues are manually managed
Tasks can by enqueued in a transaction but will execute outside the transaction
A task is a GET or POST request
GAE execute as many task as possible following the token bucket algorithm
– a bucket size (maximum number of tasks that can be launch at once)
– a token refresh rate (how fast the bucket replenish)
– a maximum number of concurrent requeste
If a task failed it will be re-tried according to the retry policy.
There is a 10 minutes execution limit on front-end instance (instead of 1 min for synchronous requests)
GAE also provides a cron service that you can configure in an xml or yaml file.