Saturday, October 11, 2014

What is an Appropriate Software Testing Standard?

Recently, I wrote a blog at the Software Testing Club on why I didn't sign the petition opposed to the ISO 29119 standard. In the blog, I laid out my opposition to the manner in which the ISO organization developed and published the standard. The process should have been much more visible with free drafts of the standard available for public review. As a result, I took the stance that I would simply not review the standard - either by paying for draft copies of the standard or by reviewing second-hand descriptions. That led me to the difficult position of not being willing (or ethically in a position) to oppose it.

The ensuing discussions in the comments were lively and interesting. I invited - and actually promoted - criticism of my decision in order to gain some insights into the discussion. Many of the answers focused on the mechanics of getting me (and thus others) to sign the petition, mainly based on the limited information freely available. I agreed with many of the arguments and actually said many times that indications are that I would oppose using the standard when it actually was published. All of the comments were valuable and I thanked everyone for contributing.

One extremely interesting topic that I think hasn't gotten a lot of press is whether testing can never be standardized (or regulated) in any way. In fact, a couple of times I actually asked that question explicitly. In general, no one was willing to tackle this question with a common answer being that testing is a purely intellectual process. That answer is helpful in separating the content of software testing from the goals or intent. One person actually said that they felt software testing could never be standardized. I laid some bait by actually stating the opposite stance: I had no problem with industries such as automobile manufacturers, militaries, and hospitals standardizing their software testing. (Note the lack of any qualifiers on my part!) Regretably, nobody took the bait. :(  That could have led to an interesting discussion to shed some light on the boundaries of the topic.

I think the question of standardization is an important question for two reasons. First, there are many heavily regulated industries existing now that governments and industries feel compelled to regulate in general. Secondly, the specific topics of automation (such as drones and self-driving cars) and machine intelligence will lead inevitably to the regulation of  software content. Will that regulation spill over to mandating testing for machine intelligence or control? How does that differ from mandating testing practices? Where do they overlap?

One point I would like to make here is that I don't differentiate between industry standards, government regulations, or internal corporate guidelines. Each address a given context (an important point that I will expand on later) and all mandate behavior of teams (developers and testers) in some way. The only difference between them is the authority behind the mandates. As a result, for the purpose of this blog, I will refer to them all as "standards" in a more generic sense.

There are many blogs written opposing ISO 29119 that give some interesting insights into the boundaries of standards. In general, the authors don't oppose standards in general. One interesting point was that many appropriate standards deal with interfaces and not content - very similar to the definition of software testing as an intellectual process mentioned earlier. That is a very important point - almost every standard I could think of did not deal with mandates of content or behavior and only with the interface. A good example of this is the US government Section 508 standards for accessibility. Although they contain a shopping list of development coding practices (such as using the alt-text fields in web development), these are treated as development suggestions and the overall implementation concentrates on functional testing of the products with actual users.

Based on what I have determined so far, here is a list that (to me) measures whether a proposed software testing standard would be appropriate (any criticisms or suggestions for expanding the list would be appreciated):

  • Measurable - The standard should clearly outline the problem or issue it is attempting to resolve. It should be stated in a way that allows the creation of one or more metrics that can be used to determine the effectiveness.
  • Context - To be effective, a standard has to have a clear context that shows the limits of the intended domain of the problem written above. If the standard is then extrapolated to other contexts, then some statement of the new context and problem description should be created. An example would be the US military adopting a standard to address weapon safety that was originally written to address patient safety in a hospital outpatient setting. This implies that a context of "universal" is inappropriate for a standard.
  • Interface vs Content - The standard should address target goals and end effects and not mandate specific processes or methods. An example would be mandating sufficient testing to insure PII is protected in a banking industry environment. As with the Section 508 standards, it may address some suggested methods or practices for helping to achieve the goals, but implementation of the practices should not be used to measure compliance with the standard.
  • Temporal vs Indefinite - Another indication of an acceptable standard is a mandate to evaluate the standard at pre-defined periods to evaluate the effectiveness and potential changes (e.g. every four years or so). That gives the targeted industry time to prepare potential changes and collect metrics to support those changes.

Based on what I know of the proposed ISO 29119 standard, it violates every one of these criteria I have outlined. Is that enough to finally sign the petition opposing the standard? I'll have to think about that.

Saturday, February 22, 2014

Top Ten Rules for Software Testing

This was a serious post at the Software Testing Club. Instead of piling on the discussion, I decided to vent here instead:
  1. There are only 10 types of testers, those that know binary and …
  2. Never marry a tester for their money.
  3. Never work for a company that hires you for your test case writing skill (akin to “Never get involved in a land war in Asia.”).
  4. All developers know where the lines are.
  5. Always test outside the lines.
  6. There are no lines.
  7. All of the real rules are written in pencil in a plain black notebook in a corner shelf in a back room of your company. On the door is a sign that says “Beware of the Leopard”.
  8. Never trust a tester who follows those rules.
  9. Water pistols are not just useful for training cats, but are also effective on people who utter terms such as defect-free, always, never, complete, 100%, and “testing end date”.

    And of course …

  10. Never ask a tester what they would do in any general case, because it always depends on the context and they end up not taking it seriously.

Saturday, January 11, 2014

Regression: A Test By Any Other Name ...

I recently posted a blog on how context influenced my analysis of a book I recently read: . In the post, I referred to a discussion on context I had read a long time ago in a book on web services. (I think it was titled "Web Services", but weren't they all?). At that time, the state of web services was immature to say the least: many implementations were essentially the same old stove-piping that was repackaged in a new wrapping. The author referred to these as "the Legacy" and went on to identify other contexts that he would discuss in the book:

  • The Legacy refers to the way things were done in the past. ("We used to ...")
  • The Now refers to the way things are currently done. ("We are ...")
  • The Future refers to the way things may be done in the future. ("We (eventually) plan to ...")
  • The Ideal refers to the ultimate concept of how it should be (ideally) done. ("We should ...")
The author made the point that mixing these contexts in the same discussion can lead to misunderstanding and confusion. I have found that to be a good way to analyze arguments both online and offline. When someone starts switching contexts in the middle of an argument, it is time to call them out on it before proceeding.

In that post, I associated software regression testing to the Legacy and test planning to the Future. After thinking about it, I decided that test planning actually belongs to the Now with the reasoning that anything that can be conceived of Now belongs to the Now. I added a comment to the post that the Future belongs in that area of mind maps and test plans that should be labeled as "Beyond Here Lies Dragons", to borrow a 16th century map legend.

Then I looked at regression testing. Does regression testing belong in the Now? It is certainly part of test planning, but what exactly differentiates it from "normal" testing? Why even use the term if it is simply "testing"?

There are probably twice as many definitions of regression testing as there are testers in the world. As a result, I will be talking about what I have referred to as regression testing on previous software efforts. Specifically, I define regression testing as having the following characteristics:

  1. It consists of tests procedures that have previously been identified as useful for automated checks. These were once tests that were converted to checks for specific procedural paths, what I like to call "trip wires".
  2. These automated checks have been incorporated into one or mores suites of automated checks that are run on a regular basis during feature development for the application under test.
  3. The automated checks are run "as is" with no metrics associated with them. In other words, a successful  run of the automated checks is not used to determine the relative maturity of the current feature development. (Thus, the association as "trip wires".)
  4. An unsuccessful run of the automated checks is investigated to determine the reason for the failure. Once the cause is determined, one of the following actions is performed.
    • The check is re-run to determine if the failure is intermittent.
    • The check is modified to make it more robust or to change the procedure to reflect recent (approved) changes in the software.
    • The check is retired as no longer useful in the context of the current software business rules. This is done when the changes would essentially create a new check. Instead, the new check is created and the old check is dumped.
  5. The population of automated checks are controlled to retain only the most useful and is culled sufficiently to support maintainability and run time requirements (in the case of GUI automation checks.
Note that even though each individual script or automated procedure is technically a "check", the overall interactive investigation of check failures of characteristic #4 make the overall process a "test". As a result, I refer to them as "checks" or as "tests" depending on the context of the discussion I am having, sometimes using both terms in the same discussion.

Finally, the nature of the automated checks as retention of the legacy procedures in the current environment firmly places them in the context of the Legacy. Essentially, they answer the question "Does the current application build act the same as the legacy version in the context of the legacy test procedures. Their use in the test planning of the Now makes it necessary to separate them from other tests based on this Legacy context.

Wednesday, September 11, 2013

Down the Rabbit Hole of Clustered Defects

This article was spawned by my response to another blog entry I read recently concerning the Pesticide Paradox: .

The Pesticide Paradox simply states that static test cases become stale and unproductive over time. The article above goes into tactical test methods of changing up the test cases to account for that affect and continue to find new defects. My contention in the comment was that the Pesticide Paradox can have some deep and subtle implications that should be considered when creating these new test case variations. Here, I elaborate on some of those considerations.

The Relationship of the Pesticide Paradox to Training

One may start correlating defects found with test cases and find that only new test cases produce defects. One reason for that could be training. In a quick, session-based, exploratory test sessions, the tester comes to rely on recognition of defect patterns that they are familiar with in those test cases that are already documented. Likewise, developers begin to identify those practices that produced those defects. This can be thought of as a form of cognitive bias, where the creation of new test cases serves to train the tester in different bug patterns. This continual establishment of new and different patterns is one reason that session-based testing is so effective  when combined with a risk-based test area identification framework.

Possible Relationships Between Multiple Clusters

Each defect cluster may identify a specific flaw in the code development. If the coders didn't identify the root cause of the flaws and change the practices, then these clusters may occur many times in different places. Was there a pattern to those clusters? This is can be thought of as a form of model-based testing, where we are looking for the underlying cause of the clusters. Superficially zeroing in on clusters without this overall view of the cluster patterns or modeling of underlying cause of the clusters would have limited value in improving test quality.

Layering of Defect Clusters

The clustering may be layered in complexity. New test cases may extend the previous tests down one layer without addressing the root cause of the defects. Example: A tester submits a set of defects one iteration where the GUI doesn't identify string overflows to the user. That is fixed with GUI field checks in all of the identified places. Next iteration, a middle-tier defect is found when an long string is entered in a field not addressed. Then the next iteration, database errors in the log are found that are due to other fields that are not addressed. Instead, the system architecture as a whole should have been evaluated initially to identify the extent of the problem.

These are just a few of the potential complexities that defect clustering can create. Actively questioning the underlying reasons for the clusters is always necessary to insure that the cause of the defects is addressed.

Sunday, August 25, 2013

On the Trail of the White-Tailed Defect

Recently, I read an article about evolutionary biologist Dirk Semmann of the Universtity of Göttingen in Germany. His research suggests that the white tails on rabbits is a defense mechanism. Simply put, a predator will focus on the bright tail instead of the animal. As a result, when the rabbit turns sharply the predator momentarily loses sight of the rabbit and it is better able to get away.

Now apply that concept to defects in complex systems. You (the predator) are on the hunt for defects (the rabbits) in the system to prevent release bugs (☼). Some of the defects are slow, some are fast, some are camouflaged, and some are very, very, sneaky. This article deals with one of the most elusive defects in wild - the White-Tailed Defect. This defect gets you to focus on an obvious trait that always appears to identify it. It is such an obvious target that you decide it is a perfect candidate for automation. You can then spend your limited time finding those pesky camouflaged defects you know are still hiding in the system.

Everything runs smoothly until release day, when the developers decide to make a small improvement in the application that should have no real impact. Then the White-Tail Defect strikes and deftly slips by your well-designed defenses.

Here is one example White-Tail Defect. You are testing for field overruns on the string fields and find that the developers have implemented a standardized GUI check that displays a dialog warning to the operator when the field is overrun. It is built into the GUI framework and reliably provides the same dialog when the check is run. You automate the check for the dialog and all is well. Then, a field is modified by a coder that is new to the team and forgets to add the GUI overflow check. It just so happens that your automation never checks the actual value stored in the database or checks the database error logs.  It isn't until after release that the system error shows up in the customer database (☼).

So, how do you hunt a White-Tail Defect? The key is to understand your prey and adjust your tactics to compensate. The white-tail defense is a cognitive trick that plays on the mind's need to see patterns in things. By suddenly breaking out of that pattern, the defect can escape detection. Here are two tactics for making that less likely:

  • Hunt In Packs - You can fool some of the testers some of the time, but a well-coordinated team can catch most of the White-Tail Defects out there. Communication, coordination, and open dialog on changing up your test approaches in a session-based format leave few places to hide.
  • Layer Your Attacks - The White-Tail Defect can slip by at the GUI, or at the database, or at the unit level, but that trick it pulls is not as effective when facing multiple strategies. Mix up tours, steel threads, sessions, quick tests, and automated checks to place multiple barriers in the way of the defect.
Good hunting!

Saturday, October 8, 2011

Spinning Down

Ok, so we just got our latest release out the door - it's time to relax, right? Working as the single tester on a small product test team has some advantages, but at the end of the day the buck stops here. Stress leading up to a major release is manageable, but what do you do after the release is out the door?

That adrenaline keeps pumping for a while and it takes time to spin down. Here are some thing I do to help manage the days after a release:

• Don't arrange a family commitment the weekend after a scheduled release. No matter how well you prepare, there may be a scheduled delay. That is one pressure you can do without.

• If your spouse asks how the release is going, the answer is "Fine" - no matter what. You aren't lying, just reassuring. Being asked about stress at work is also one pressure you can do without.

• Plan on spinning down gradually. It is out the door, but you could still have last-minute install problems crop up.

• Keep the schedule sane in the week leading up to and following a release. Not getting enough sleep means making mistakes.

• Don't make formal arrangements for a "relaxing" activity. Step back into your normal routine. Plan nothing and let it happen.

• Don't try to force yourself back into a "normal" sleep pattern. Starting a week or so before a release, my body tends to wake up at 2:00 AM no matter what. That continues for a while after a release and I just don't worry about it too much.

• Mental and physical go hand in hand during stressful periods. With me, it's stress, coffee, and spicy food - pick any two out of three. Around a week or so before and after a release, it's usually time to go cold turkey on the caffeine.

Oh, last one ... write a blog. Sharing with others online is a good way to cut the stress. Hey, it really does help!

Tuesday, August 23, 2011

Using Automated Scripts for Test Workflow Automation

Test automation covers a large area from large, traditional test management suites to simple text editors. This discussion focuses on using scripted automation tools to support and improve the overall test workflow. This workflow support can be provided using scripted automation at various interfaces: database, development environment module interfaces, or GUI interfaces to provide some examples.

Traditionally, scripted automation has been used to run checks to verify established application functionality (e.g. for regression checks). An alternative usage for automated scripts is to assist in executing the overall test workflow. Two methods of accomplishing this are presented here:
  • Performing smoke tests
  • Creating complex configurations to support test sessions

Smoke Tests
Smoke tests are a special type of test that does not belong in the category of regression testing. Regression tests are intended to thoroughly verify functions at a broad spectrum of interface points. The smoke test is intended to perform a specific function: to provide a minimum gateway for allowing development builds into the QA environment.

The smoke test performs a quick check of the overall application "happy paths" to identify major functional failures. Here a "major" failure is defined as one that prevents the testing of a significant portion of the application. Unlike regression checks, the smoke test is not intended to thoroughly verify any particular function. In fact, if properly designed, it should not be susceptible to minor failures at all. Instead, it should interact with a minimum of interface objects to limit the likelihood of a minor failure.

In addition, the time box limitation of a smoke test puts an emphasis on getting "the biggest bang for the buck". The smoke test should be continually tweaked to include as many major functions as possible and still complete within the designed time limit (typically 30 minutes to 1 hour), especially for GUI test automation.

Creating Complex Configurations to Support Test Sessions
When considered as a workflow framework, the scripted automation takes on the role of performing a set of tasks as opposed to verifying the functionality of the application. The size and complexity of the automated scripts can be critical. This is due to the fact that the likelihood of a critical stoppage grows exponentially with the size of the script. For example, a critical stoppage early in the processing of a large script would impact the entire flow. Dividing the overall test workflow into ten separate scripts, may limit the impact of a critical stoppage in one of the scripts to 10% of the overall testing.

The size and placement of the scripts in the overall test process should be balanced between usability, run time, maintenance, and the frequency of script stoppages. In general, the scripted automation should be targeted for tests that have very complicated setup procedures or involve a large amount of redundant setup steps that would be a large burden on the tester if performed manually.

These are just two example of using scripted automation to support test workflow. If implemented properly, the injection of small, well designed scripts into the test process can provide a significant improvement in the overall test quality.