Saturday, October 9, 2010

Dedicated hosting and SLA

When you are on a project with a stipulated SLA (service level agreements), like 1000 concurrent requests per minute, it is always good to be selfish.

That is, to hog all the system resources to yourself and avoid shared hosting.

What happened recently was that we had a client who asked us to develop an application and host it for them, with a certain SLA on response time. However, the client would also like the server to host another project.

We did not think much on it. However, at regular intervals, our application would slow down, with long response time. We wondered if we had to tune our application, web server, or databases.

Turned out however, that the other project hosted by the client would, at regular intervals, eat up almost 100% CPU usage for a short while!

That would surely hurt our stress test.

So we informed the client, shut down the other project, before doing our stress test.

Lessons learnt? Always be selfish and go for dedicated hosting. You might not meet a reasonable client every time.

Friday, September 24, 2010

Beware of firewalls

I'm always weary whenever I see firewalls in deployment architecture. It almost always spell problems. Problems that one will never realize during user acceptance tests. Simply because a user acceptance test environment will never mirror production environment. Hardware firewalls are not cheap.

Luckily, most firewall problems I had seen always boil down to the same thing.

Closing of idle connections.

Most applications would use connection pools, to reuse existing opened connections, for better performance. It can be for database connections, ldap connections, or any other kind.

However, a problem occurs if a connection is allowed to lie idle over a period of time.

Especially if the pool idle connection timeout is longer than the firewall idle connection timeout.

What usually happens is this:

1. Firewall idle connection timeout is set to 30 mins.
2. Database idle connection timeout is set to 60 mins.
3. Database connection timeout is set to 10 mins.
4. Connection is opened from client to server.
5. Connection is left idle for 40 mins, and the firewall closes the idle connection.
6. Client application retrieves the closed idle connection from pool, and sends a database query.
7. Client application blocks for a 10 mins before timing out, because the connection is already closed.

There are three ways to resolve this problem.

1. Set the connection to 'ping' the server while idle from time to time. This could be a select 1 query while idle.
2. Set the database pool idle timeout to be a value lower than the firewall idle connection timeout value (if you are allowed to)
3. This is the more common approach, which is to set the firewall idle connection timeout value to a value higher than the database pool idle timeout. This sounds exactly the same as the 2nd way, but people are usually more inclined to modify firewall settings than application settings.

Sunday, September 12, 2010

Creating social network accounts for corporations

It's getting very common nowadays to have an account on the social network for your corporation. It could be a need to have a Facebook or Twitter presence, or creating a Facebook application for your corporation.

But regardless of the nature of the presence and account, it is undeniable that you really want the account to be owned by the corporation, not an individual employee.

Imagine that today you asked an employee to create a Facebook application or fan page. He creates it with his personal Facebook account. His personal account is thus the owner of the Facebook application or fan page. You could have multiple admins or developers, but there can be only one owner, and as far as I know, it is non-transferable. Some year later he leaves. And there goes your owner of the Facebook application or fan page. I do not think it is possible to revoke his access to the application or fan page, because he is the owner.

That sure sounds like bad news to me.

The correct approach here should be to create a new social network account for the corporation. Have all operations and management done via that account. Make it the owner of all media and content on the social network. In fact, create a corporate email address for this too. Like social@mycorporation.com.

And then have the corporation own the account. Control access to that account, and if passwords are shared, change the passwords when an employee who knows the password leaves. But really avoid too many people knowing the password. Assign additional admins or developers on social media projects with delegated capabilities.

As social media become more and more predominant, it is critical that companies develop a social media policy, and be aware and protective of their social media presence.

Friday, August 13, 2010

Do not assume what a product officially support without first checking

Just a simple tip here. Never ever assume that just because a product has a new release, that it will have the same list of supported platforms and 3rd party products.

It is common for new releases to drop support for older platforms. Or less popular platforms.

Even if you are asked if a popular platform is supported, it is best to still look the list up. Check if there are any need to first patch the system or products. Find out all possible incompatibilities and known issues,

Always look up what is officially supported by the product company. After all, if a product is officially supported you know who to turn to if there is a problem. Otherwise, you are on your own resolving the issue. And you will really wish you had quoted a premium price for helping to support an unofficially supported product.

Friday, June 18, 2010

Meeting: pre and post activities

Ever went to a meeting with the clients/vendors with no idea what to ask, discuss, or report?

Ever left a meeting without being sure of what was achieved?

Or worse, not invited to the meeting itself, and not informed before and after the meeting at all on what was discussed?

These meeting are usually missing two important activities. The pre-meeting activity, and post-meeting activity.

Before meeting up with the clients/vendors, or any other external parties, it is always good and crucial to meet up internally first. Establish a set of understanding of where everyone's progress/standing is within the team, so that when meeting the external parties, a unified front can be shown.

In additional, the project manager (from this side of the team), could collect all the questions that needed to be asked to the external party before hand. List of issues to bring up, and any progress since the last meeting could be consolidated as well. Having this list prepared before hand is helpful. Popping the questions on 'anyone has anything to say' constantly in the meeting, or doing a round-the-table talking does not show that the project manager is in control. Probably would reduce the external parties' confidence in the team too.

Do some 'paper-play' as well before the meeting, especially on the issues to bring up. Think about how to response to the enquires, and offer solutions even before they ask for. This adds to the image of being professional. Definitely will justify the money spent on the team and project.

After the meeting, hold another internal meeting again. Go through once again what was discussed. Broadcast the decisions made and discussions out to the team if they were not in the meeting with the external parties. Their work require them to be aware of what is currently happening.

Also, always come away from the meeting with an action list. It is pretty impossible to not take any actions after any meeting. The team should own at least some actions, while the external parties would own others. Come back and start distributing the actions out too. Revisit the list from time to time, and report the status of this actions on the next meeting with the external parties.

Any meeting with the external parties without doing the pre and post activities would, in my opinion, not be effective, and one would end up with a confused, demoralized, and disgruntled project team.

Friday, June 4, 2010

Workflow solutions are never easy to do well

As I do more and more workflow related projects, a similar problem kept surfacing again and again.

Basically, the idea is this. A workflow, for example, approval for a new hire changes could go on for days. The request could route from departments to departments. It could sit in the manager's inbox for days before he acts on it. Now what if, while waiting to be processed, the hiring process changes?

In a paper world, that could be handle relatively easily. If there are more signatures needed, simply find the relevant parties and ask them to sign accordingly, on the same paper request.

However, most workflow systems cannot handle this well. In fact, most of them require a resubmission of the same copy of request! That is because these systems emphasis on the workflow, rather than the data flowing through the system. A rigid workflow system proves to be significantly harder to adapt to new changes 'in-place', applying the changes to the existing running requests.

Another problem I encountered was more interesting. Most approval workflows treat the request as pure data. There had been rare cases of items requested during submission to be no longer valid when the workflow actually has to act on the data (eg creating a login account on an in house system). This is especially possible for scheduled tasks, where you actually plan for usage in the future, either activation of new accounts or deactivation of existing ones.

Another interesting issue but non-technical has to do with the presentation of information during request submission. Let's say that you have a company going for a role based access policy. A manager role would have access to system A and system B. When a user request for a manager role, he is told that he would be given accesses to system A and system B. Now, while the approval workflow is going on, the policy changed. A manager role will instead only have access to system A. The right business implementation would do just that. Give access to system A. But the user might not be informed of the policy change. All he sees is that he is missing access to system B. Nothing wrong technically, but just some confusion. The workflow had never planned to inform the requester on the changes.

And another thing I keep seeing is the need to view the history of a request. What a user requested, when was it approved, when it was granted, etc. These are really request specific, and so each action must be audit manually. Luckily, the implementations I had been involved had mostly forsee these, and planned them in early. But there are cases where we had to modify the workflows to capture even more data beyond the requirements, as the client think up more and more of what they need.

Of course, most of these are beyond the responsiblities of the workflow engines we use. The solution provider should really plan to resolve these as part of the implementation. And they should be planned early. They are not easy to solve, but it will surely make for a much better user experience.

Monday, March 22, 2010

Application Internationalization and Localization

First off, a definition of what these are.

An application usually can and will be used by people from diverse cultural and language speakers. And so there will be demand for the application to speak to them in their own culture (date and money formats) and languages.

And so the act of creating culture and language specific versions is known as localization. But before that, you had to perform internationalization to extract all possible.

I have done two larger projects that involve internationalization of applications, with various smaller ones, but only one that actually involve localization.

And they sure are a pain to do. They use very simple concepts. Simply extract all string from the existing application and put them in a property file. And define a format string for each money and date format use, and place these in property file as well.

But if you treat each string as a unique entry, you tend to end up with alot of duplicated string. A message like "Approver name" could appear in both a form submission page, and a form status page.

It is extremely tempting at this point, to simply allow the two placeholders to use the same string entry. And in fact, that's a perfectly reasonable and valid approach.

Until the day come with you gotta change the text themselves. Imagine that, someone decided to use the message "Choose an approver" in the form submission page. Before the change to the message, he has to first evaluated the impact of change. Like, how many other parts of the UI is using the same string entry?

It is only when there are no other parts of the UI using the same string, could the developer make the change safely. But if it is actually used n many other areas, he gotta duplicate the existing entry, give it a new message id, and update that only.

So be careful of merging and reusing messages. Do them only if both are the same message, within the same context. A message string that is used during the selection of an approver, and one that is used to display the approver might have the same string, but they are in different context.

Another point in hand, is to really hire someone who is proficient in both the language translating from and to. The least worry you want to have is on the grammer and spelling related mistakes.

And lastly, avoid concatination of messages. You might have a message that goes like this: "Your account expired on XXX. Would you like to renew?"

It is tempting to have two string here, "Your account expired on ", and "Would you like to renew?". Do that, and you might wish you had never been involved in the project when there are message changes. One day the message might be changed to "XXX: Account expired. Renew now?"

It is better and cleaner, instead, to use a single string with placeholders like this: "Your account expired on {0}. Would you like to renew?". That way, you get the absolute flexibility of having the formatted value anywhere in the message, and you get to keep the message as a whole. During localization to other languages, the context of the message is clearer to the person working on it as well.

But granted, what I have described are simple principles and techniques, but doing the actual work itself is never easy. Especially when you are doing a 'extract message id' along the way you develop in a team environment. "I need a new message, does it exist? I have no idea. I'll just add it and hope someone will use it later". And then down the road the same message, of the same context, might have like five entries, all used in different part of the UI, just because they were all added by five different people.

So lastly, I proposed the addition of a internationalization owner, who owns, manages, and gives out message id. Tell the person the message you want, and the context, and the person decides if it should use an existing message id, or a new one.

I have not seen it in work, but it sounds feasible. Any comments?

Monday, March 15, 2010

Have an effective bug tracking system

Seriously. A bug tracking system is essential. I don't mean a software application. Just a process will do.

When a bug is reported, detailed steps of how the bug occurred should be provided. Screenshots would be helpful. The actors involved should be named as well. For example, for a request workflow, state the requestor, assignee, appover, and so on.

And of course, if this is a testing phase, try to have multiple test accounts. When a bug occurred, stop using the test account involved in the bug. This allows the developer to look into the data state when the bug occurred.

Though of course, in most cases that could not be done.

Back to the system. A bug will go through various phases. It will be open, fix in progress, testing, and closed, for the basic status list. It should be noted that the bug should always be closed by the group who reported the bug, not the developers themselves. This removes any future conflicts between the client and the vendor.

So, we have a 'new' bug. The user submits it with a description of what he was trying to do, what happened, and what SHOULD happen. I feel that it is essential that all three such things are submitted. Many times I encounter vague steps of what a user did, and what happened. And then they leave out the part on what should happen. So I have no idea which part of the happening was the bug.

A priority level is tagged with each bug as well, ranging from fatal (cannot proceed on and is a show stopper) to minor (work but is an annoyance). Most people use a number system, from 1 to 4, and depending on preference, 1 could be the fatal bugs.

It is likely that there will only be critical and fatal bugs. The user would usually want everything to be changed till it suit their taste, and if no one manages the defect list, be prepared to be sucked into a neverending whirlpool of changes. Negotiate and discuss the priority of issues with the user. Give and take for some of them. Most users are reasonable, but there are always those few unreasonable ones.. especially when politics come into play.

At times, rather than doing a time and effort consuming fix, a workaround can be suggested. Update the bug status with the work around suggestion, and throw it back to the user for review.

Now for the case where a fix is needed. The bug should belong to a single developer at any time. And this is actually where I wanted to complain at the start of this post.

The bug was originally assigned to someone else. But as that developer was assigned with too many fixes, the others started to help out. But without consulting and discussion with the original assignee, bug fixes get duplicated. Time and effort are wasted. Confusion arises.

Do a reassignment of defects with the original owner around. Please.

To carry on with the discussion of the system. After a bug is fixed, it should go through a round of internal test. The bug status should be updated accordingly. After it is verified to be fixed, it should be thrown back to the user for testing, and then readied or production.

This is another point of a bug fix cycle to be careful about. Sometimes, the fix expose another irrelevant bug. And the user would report that as part of the current bug, and keep the bug open.

Avoid that! As far as the facts are concerned, the reported bug was fixed! Any furthur bug or changes should be reported as a new bug or change request. This is especially important if the team is committed by contract to fix all bugs reported before a deadline, which is considered a milestone completion. Some clients might actually wish the milestone to not be reached, to avoid payment.

In conclusion, do take care into setting up a proper bug/defect tracking system. It could turn a profitable project into resource hogger and waster (if these are real words).

Monday, March 8, 2010

Avoid doing updates as delete/insert

Recently, for a project, we had interactions with a database schema in the following form.

User has multiple Positions, which has multiple Roles.

Now, here, we tried to do the quickest way out, thinking that, when a single entry in Role has changed, or been added, we simply did the following:

  1. Delete all Positions of User
  2. Delete all Roles of User (Role had a column entry of User ID as well)
  3. Reinsert all Positions of User
  4. Reinsert all Roles of User.

Why did we do that? Well, quite simply, because we cannot tell from the each object entry if it is an update or insert, and no way to tell deletion since the consumer of such an object actually remove the role from the position array to indicate a removal.

An alternative that sprang to mind was to do a select again before the updates, and do delta compare before updates. That seemed like an awful lot work to do.

So this delete/insert approach works.. but it proved to be a wrong approach.

It creates excessive and unnecessary strain on the database, in terms of redo logs. Depending on how you configure the database, the redo logs might be considerable, especially if you get hit with a 'power user' of say, 300 positions with 200 roles each. Sure, unreasonable example, but if some logic went wrong somewhere...

And also, such insert/remove bumps our primary key id generated value up significantly fast. I have no idea what happens when it reach the maximum limit, if there is one.

And finally, because the primary key id changes so very often, another way has to be introduced to uniquely identify each entry. It could be another running sequence number, or a composite key of a few values. You cannot rely on the primary key id, which becomes somewhat redundant.

A compromise could have been reached.

It might have been better to mark an entry with 'tags'. Eg, a role without an id indicate an insert statement. A role with an id and a 'isUpdate' to true is an update. While a role with an id and a 'isDelete' to true indicate a deletion. Rather than work with it transparently as removing objects from an array, use flags to indicate the operation to perform on each entry. This is more work on the consumer of the object, but better than the alternatives.

Monday, March 1, 2010

Allow for business policy changes

I was reading over Robert Martin's PDF on design problems and solutions to accounting software. I have yet to finish the PDF, but the problem they described was so familiar to me in some ways.

In fact, it is to do with policy/rule changes. Along with scheduled tasks.

Case in point. Two companies had a contract between them, the sale of a particular material at say $x per piece. Payment is usually after delivery, and there is a scheduled delivery two months later.

One month later, and the company signed a new contract, at $y per piece. So, when the materials are delivered, which rate should it be using?

It is entirely likely that the answer is, it depends on what the business wants for that particular contract. For some companies it might be $x, others at $y.

So when we develop such a system, we must first take care to not bind the delivery to a fix rate of $x.

On the other hand we should not bind it to the current value from the contract too. These are such major decisions that the system should avoid making.

In such a case, it would help to escalate the issue to a business user, informing them of the situation, and the choices.

Such a system is of course much more complicated, but it is likely what businesses want. They want automation, and the ability to intervene.

But remember though, for every intervention and decision, it should be recorded and audited, as part of the request history for future reports and audit purposes.

There are of course, many other similar cases where the variables of a system when a task is scheduled is very different from when it is to be executed. Email sending, account creation and deletion, role assignment, etc (mostly in the context of identity management since that is where I'm concentrating on).

But the solution are similar. Avoid binding values at schedule time. Verify that all is the same at execution time. If any differs, leave it to the business users to decide. They may decide to sell at a loss now to earn more later. The system can never advise or decide that.

Monday, February 8, 2010

Beware of the datatype char

I had the most unexpected and weirdest bug encountered recently, and this is probably the strongest argument to have a consistent testing (and sometimes even development) environment!

In our development environment, we are using a database schema that had two fields, status and reason, as varchar of 2 characters. This means that both fields would be up to 2 characters for storage.

And on the test, staging, production, we actually have the fields defined as char of 2 characters. This means that both fields would be using 2 characters for storage, even if it is a single character.

They are different database products, and me not being a database expert, declined to comment on the difference. Perhaps one of them did support varchar. Or maybe the other was more efficient with char. In any case, it sounded like it would still work. After all, the difference ought to be just on storage.

And I was so wrong. We had weir cases where some comparisons of the returned value failed.

On closer inspection, a value of 'A' from the database was in fact, 'A '! It was padded with whitespaces behind!

That took a while to find out, though fixing was easy. I'm gonna be careful about char from now on.

Friday, February 5, 2010

Meeting: Focus on problems the right way

When working on a particular problem, there are many times when we will be slapped with a particular restriction that prevents us from progressing forward in an ideal way.

When we looked at this from the outside, we are able to rationally decide that the logical solution here, is to work around the restriction, such that we are still able to progress forward, even if it is slower. Or sometimes, suggest to get rid of the restriction.

Yet, when we are part of the working team, we stumble, running around in circles, not going anywhere.

Imagine this scenario. You are in a meeting. You encounter a restriction. You report this to the others in the team. All you get is constant grilling on why this happen. And then asking for a solution. Then various brainstorming of solutions. Which many would not make sense, or unachievable. And then the meeting end without even a resolution. And then come the next meeting, and the restriction is reported again. And then comes the same round, and some finger pointing and blaming game.

This is highly destructive to the team dynamics, as restrictions or problems might go unmentioned simply because.. mentioning it does not help at all. Stabbing begin as people try to shift the focus to other people's problem than one's own. The grilling lowers a person's confidence. And of course, most importantly, nothing gets done. Meeting is essentially useless.

One of the critical use of meeting is to bring everyone up to date with what's happening in the team, and harnass the collective intelligence and authority of the team to resolve stumbling blocks of the project. And the above scenario does not do any of those.

A strong project manager is essential, to groom the right culture for meetings.

When face with a restriction, all the team need to know is, how the restriction would affect solving the problem. A restriction is not the problem.

Remember, the member had never intended for the restriction to be there. It was not his fault. It is a problem, and not HIS problem.

And lastly, but most importantly, a mindset shift. Stop focusing on what cannot be done (the restriction). Focus on what could be done. Review your available options. And how those options could help to work around the restrictions or solve the problem.

Look into ways to remove the restriction, or if impossible, work around the restrictions. If need be, change or review the original solution to the problem. Change the original solution if all else fails.

This does not only apply to work, but in life as well. Sometimes we are so struck with what we cannot do, we do not progress, and forget about what we could do instead.

Be positive when faced with problems or restrictions. Focus on what we can do, not what we cannot.

Friday, January 22, 2010

Code comments

I used to have a rather strong stance on comments in code. Mostly in the form of avoiding them. I believe that the code IS the comment. If you had written clear code, you had no need to write comments. They would just be clutter in the most cases.

I still believe in what I just said, but recently, I had began writing more and more comments in my code.

That's because that despite clear code, sometimes it's useful to explain, in the code, why some things are done in some way.

To the coder (person who wrote the code), everything is clear. He might write two loops over the same list of objects, modifying them in different ways. He might have done that to keep the objects in a consistent state, to finish one set of operations before another.

Now, sometime later (or maybe just days), someone came along and notice this code. Why did the coder not merge both operations in a single loop? That would help in performance. He did not ask the coder, as the coder was unavailable. So he made the changes, run some basic tests, and check the code in.

And then a whole set of bugs appeared in the next few days.

This could have been avoided if the coder had commented in the code why he seperated the operations as two loops.

It is the same way in documentations. Rather than document just how things work, the documentation should explain why things work this way, why an approach was chosen, and why others might be discarded.

Anyone can figure out how things work, or how the code flow. But it is the why that is many a times missing. And we definitely need to document them in, both in code and doumentations.