Featured post

Another Git branching model

We’ve switched to git at work a few month ago. Not an easy task but the rewards are worth the trouble. Our branching model was based on Git Flow because it’s well documented and gives you a structure to start with DVCS. Well, after a few iterations it wasn’t working as expected in our context. So we had to come up with our own workflow.

I guess Git Flow works well on a clean code base with good test coverage. But on legacy code, where one feature means two regressions, a release branch is like the vietnam war, you never know when you will get out of it. That was one of our main problem on subversion, we were creating release branch to go to production. And it would take forever to actually ship the code. Meanwhile all other development efforts remain stuck.

I though that cheap branching and merging in git would solve our issue. But cheap merging is not enough, you also need to be able to easily pick what to merge. And with Git Flow it’s not easy to remove a feature from a release branch once it’s there. Because a feature branch is started from develop it is bound by its parents commits to other features not yet in production. As a result, if you merge a feature without rebasing you always get more commits than wanted.

So here is the workflow we use to solve those issues:

The main branches

We have three branches with an infinite lifetime based on the classical trio (dev/test/prod):

  • master
  • staging
  • develop

Master is the same as in git flow:

We consider origin/master to be the main branch where the source code of HEAD always reflects aproduction-ready state.

Staging is a bit like develop in Git Flow :

We consider origin/develop to be the main branch where the source code of HEAD always reflects a state with the latest delivered development changes for the next release. Some would call this the “integration branch”.

Develop is there for continuous integration, this is where we constanly merge all the changes to detect bugs and conflicts as soon as possible. The source code in the develop branch never reach a stable point where it is ready to be released. Instead only some feature branches reach a stable point. Those stable feature branches are merge into the staging branch. Since feature branches were created from master and not from develop we can pick individualy which one will be merge to staging. In fact this is the main point of this workflow: We can easily choose which features will go into production next. 

To release the code to production we just merge staging into master.

Feature Branches

All work is done in feature branches which can be merge into

  • master for a quick fix in production
  • staging for bug fixes
  • develop constanly for continuous integration

Since we use github we usualy do a pull request to merge feature branches. We don’t always follow the rules and commit on master and staging happens, they are merge back to staging and develop. The only place where we don’t commit is develop 😉 (only merge commit)

Summary

Git Flow was not working for us, but by creating feature branches from master instead of develop we gained the ability to easily choose which features we release next. This gave us much more flexibility and got us out of “vietnam release branch”.

Now I should tell about all the best practices to make this workflow really work, but I’m lucky, someone already wrote them down.

And you, what is your branching model ?

Featured post

L’architecte

La tête dans les nuages
Les mains dans le cambouis
Les pieds sur la prod

 La tête dans les nuages

Pas parce qu’il s’intéresse au cloud, mais parce qu’il doit prendre de la hauteur et du recul pour analyser, synthétiser et savoir restituer des vues adaptés à chacun des acteurs d’un projet. Il est le garant d’une vision partagée et cohérente des objectifs métiers stratégiques jusqu’aux déploiement des composants logiques sur des serveurs physiques ou virtuels.

Les mains dans le cambouis

L’architecte qui ne code plus (pire n’a jamais codé) a vite fait de s’envoler dans les nuages et de perdre le contact avec la réalité. De la vue aérienne d’un projet il doit pouvoir zoomer sur un composant logique, sa conception et la ligne de code. Seul un astro-architecte qui ne quitte jamais les stratosphères de sa cellule architecture et méthode peut retenir des technologies comme EJB1/2 ou JSF, qui en pratique sont inutilisables. Une technologie ou une architecture peut avoir toutes les qualités que l’on voudra, si elle n’est pas comprise et adopté par les développeurs ça n’ira pas loin.

Les pieds sur la prod

Tan qu’il n’est pas entre les mains des utilisateurs un projet informatique ne produit aucune valeur. Et le passage obligé pour atteindre les utilisateurs c’est la prod. Vous suivez à la lettre les recommandations du site 12factors ? Super votre application est prête à être déployé dans le cloud. Dommage, ce qu’attend votre prod c’est un ear pour déployer sur webFear ! Pour que le succès d’une application soit complet travailler en étroite collaboration avec les gens de la prod (Devops) et aussi important que de leur faire avec le métier (les méthodes agiles)

Et chez vous il fait quoi l’architecte (logiciel bien sûr) ?

The problem matters more than the solution

You heard before the quote “Fall in love with the problem, not the solution”.

Today with the advent of Large Language Model (LLM) like ChatGPT it is not only true but we can prove it.

LLM are called Large because they are trained with trillions of words (or tokens)

But still it’s not because an LLM knows everything that he understand anything, or to say it another way Large Language Model are bullshit artists.

Therefore if you use ChatGPT to help you with anything you better focus on your problem and not the solution you might have in mind. He will confirm any bias there is in your solution. If instead you present clearly the problem, guess what ? He already knows the most statistically significant solution.

Let’s take an example.
My photo collection on my computer is a mess, with a lots of duplicates and even some filenames changes. Therefore I asked chatGPT to help with the solution I had in mind:

“Write me a bash script to find all the files with the same size but different names in a directory”

10 seconds later here is my script, it’s working and doing what I wanted.

But when I looked more closely at the result I am finding some errors. The script is listing as duplicate some files which have the same size, different names but they are not the same picture ! Stupid me ;(

Let’s go back to ChatGPT but this time I am going to tell him about my problem not my solution:

“I want to find all the identical files in my computer”

“To find all identical files in your computer, you can use a duplicate file finder tool, or you can write a script to do it yourself. Here’s a Python script that uses the hashlib library to find identical files based on their content.”

First he reminds me that specialized tools exist to do that and then he propose a python script. And guess what, the script is using an md5 hash to find duplicates. A much better solution that what I first asked him to do.

Google Cloud Platform technical qualification training : Cloud SQL

Not much to say about Cloud SQL, it is simply a fully managed mysql database version 5.5 and 5.6.

Fast connection to App & Compute Engine but accessible from anywhere. (you have to enable the access)

Google can manage replication for you. mysql instance are brought up on demand and can go away after 4 hours of inactivity.

Up to 500 GB per database

 

 

 

 

 

Google Cloud Platform technical qualification training: App Engine

I’ve just spend the week at the Google office in London to attend the CP300 course. I thought a good way to prepare for the certification exams would be to put down my notes on my blog…

Thanks to Ignacio who was leading the training.

The first two days were dedicated to Google App Engine.
App Engine is all about building scalable, reliable and cost effective web application the Google way, it :

  • Leverages Google CDN to serve static ressources
  • Use Stateless application server with automatic horizontal scaling
  • Use a NoSql datastore (you can also connect it to a relationnal database, cloud SQL, if needed)

You can configure the way it scales by tweaking pending latency and the number of idle instances. This will impact the performance and the cost of your application.

Instance on Appengine can stop and start frequently, this means you should avoid framework with long start-up time such as Spring or JPA. For depency injection prefer Guice or Dagger (injection is done at compile time)

There is a status console to check if all Google services runs normally. You can also receive notification about downtime by subscribing to this group.

The App Engine console let you monitor quotas usages, very important, most errors on App Engine append because of quota limitations. Of course you can pay to remove those limits. You set a maximum daily budget to make sure you won’t suffer from a denied of service attack on your credit card !

You can deploy and run in parallel multiple versions of the same app (blue/green deployment out of the box)

The app stat tool let you analyze performances.

Authentication & Authorization

GAE provides a service to handle Authentication & Authorization for you. It will use Google account or an openId provider. You can also integrate GAE with an enterprise SSO solution but it requires a Google Apps for business account.

Authorization to access other google API (calendar, storage, compute, …) is done with OAuth2.0.
You can try service calls and Oauth2.0 in the playground

The Datastore

This is the heart of App Engine, you better understand this if you wan’t your application to run well on App Engine.

The GAE datastore is based on Google BigTable, it provides strong consistency for single row but eventual consistency for multi row level.
Every row contains an entity of a certain kind. An entity has a key and properties, properties can be multi-valued.

An entity can have a parent to form an entity group (a single entity without parent count as an entity group). Entity group are usefull to force strong consistency when writing data.

Data on bigtable is distributed by key, if you specify the key yourself make sure it is random enough to get a good distribution of content on the underlining hardware and better performance.

The DataStore is optimized for read queries. Datastore always use an index to read data. All indexes are sorted and distributed on multiple machines.
Queries on the datastore are executed as index scan on bigtable => it’s very fast (the query performance scale with the size of the result not the size of the dataset) but it comes with a few limits:
– You can’t query without an index (indexed can be automaticaly created, beware of their size)
– Queries on multi-valued properties can lead to combinatorial Explosion and big indexes
– Missing properties is not equal to Null/none
– Inequality filter (!=) are limited to one property per query (this is because it is implemented as x< AND x> to use one sorted index)
– no JOIN (use denormalization)
– no aggregation queries (Group by, sum, having, avg, max, min, …) (instead use special entities that maintains counts) see sharding counter pattern
– creating a new index on large set can be long

Indexes are not immediatly updated when writing but ancestor queries force the index update to complete to get strong consistency.

For transaction the datastore use snapshot isolation and optimistic concurrency
Transcation can’t affect more than 5 entity groups
Can’t make more than 5 updates per second to an entity group
A transaction can’t take more than 60 seconds

Memcache, TaskQueue, Cron

GAE provides Memcache as a service to improve performance and reduce application cost. A memcache query can be ten times faster than a datastore query. Memcache can be used as a read/Write cache to the datastore

GAE provides a taskQueue service to executed asynchronous work :

  • push queues are managed by App Engine
  • pull queues are manually managed

Tasks can by enqueued in a transaction but will execute outside the transaction
A task is a GET or POST request
GAE execute as many task as possible following the token bucket algorithm
there is
– a bucket size (maximum number of tasks that can be launch at once)
– a token refresh rate (how fast the bucket replenish)
– a maximum number of concurrent requeste

If a task failed it will be re-tried according to the retry policy.
There is a 10 minutes execution limit on front-end instance (instead of 1 min for synchronous requests)

GAE also provides a cron service that you can configure in an xml or yaml file.

My schedule for Google I/O 2014

This year Google I/O will be about DDD, have you read the blue book ? Oh wait, sorry, it’s not about Domain Driven Design but Design, Develop, Distribute. Interesting to see that Google choose to replace the more common “Run” theme with a Distribute one. It feels like they are saying don’t worry anymore about how you will run your application, just use our cloud. But instead think how you will Distribute your mobile application… on google store of course.

We can expect a lot of announcements and sessions around Android. And a lot more, as you can see in the list of sessions I’m planning to attend. Can’t wait to learn more about Docker, Polymer, DevOps and the Google cloud platform !

Day one June 25

Day 2 June 26

Et en attendant…

Je me promène
San Francisco House San Francisco Parking

 

10 ans

Il y a 10 ans, le 3 avril 2004 j’ouvrais ce blog. Google n’était encore qu’un moteur de recherche performant… Amazon qu’un libraire, Face… FaceQuoi ?? Les téléphones n’étaient pas smart et Nokia dominait le marché.

Il y a 10 ans Google lançait Gmail, un 1ere Avril, 1 Go de stockage gratuit, une application web aussi réactive qu’un client lourd, la bonne blague… Eh oui Ajax n’était encore qu’une marque de détergent. Subversion était un outil acceptable pour gérer les sources, Flex une solution d’avenir, Struts était en version 1 tout comme Spring, javascript était tout juste bon à ouvrir des pop-ups. Chrome n’existait pas, IE6 s’était imposé face à Netscape.

Il y a 10 ans la France votait la Loi pour la confiance dans l’économie numérique et aujourd’hui il faut écrire un rapport pour mettre en valeur les développeurs alors qu’il y a longtemps que le reste du monde a compris que le logiciel dévore le monde

Depuis 10 ans ce ne sont pas les sujets qui manquent, je vais profiter de cet anniversaire pour ranimer ce blog, rendez-vous dans 10 ans.