Thursday, February 10, 2005

Automated Testing in Browsers

DISCLAIMER: I know next to nothing about software development! This whole post is written by a complete newbie. The primary purpose of writing this was to try to learn! Please help me out with your comments.

This article is inspired by a recent comment from IEBlog with suggested post titles. There is also a relevant previous article on IEBlog, which talks more about automated debugging than automated testing per se.

Here I will talk about existing automated testing efforts that I am aware of (for browsers) and ways I feel they can be improved.

Layout Testing

Mozilla and Opera (and presumably the IE team also) use automated testing on their nightlies for layout regressions (ie, when they have reintroduced a previous layout bug). They take 'known good' snapshots of the rendering of thousands of weird HTML/CSS test cases, and see if they change from day to day. There is some excellent discussion of the difficulties encountered here.

These difficulties (which have largely been overcome) include:

  • A way to detect, log and move on from crashes and hangs.
  • Ensuring all layout is done before the screenshot is taken.
  • Strange results encountered using antialiasing/transparencies etc.
  • Testing in different window sizes.
Automatically dealing with animated testcases is understandably hard. Frame-by-frame would be ideal if possible but then I guess you run into issues by artificially flushing output streams...

The W3C provide excellent test suites for HTML/XHTML, CSS, SVG and all sorts of other standards.

These could easily be incorporated into such automated layout tests (assuming of course, that they are supported to being with!). After all, the whole point of automated layout and rendering tests is to find regressions systematically instead of piecemeal.

I think there is a real possibility for an exponential growth in test cases, particularly when you consider these tests are easily separable and distributable.

Automated layout testing is just one possible part of functional testing, but I thought it was cool because of the graphical element. There are obvious other functional tests for client-side scripting languages etc.

Fuzz Testing

This came to prominence when Michal Zalewski created his 'mangleme' tool, which spits out automated 'shards' of malformed HTML, against which the rendering engines are tested. He describes this results in this BugTraq posting. In his original tests he discovered multiple hangs and crashes in Gecko, Presto and Lynx and others. Trident was far more resilient, as we later learned, in part due to fuzz testing being part of Microsoft unit tests. A subsequent Python port of the code did reveal an IFRAME vulnerability for Trident.

Fuzz testing should certainly be applied to all possible input points. The next obvious question is 'how?'.

Larry Osterman's approach seems to be taking lots of cases of valid input, and then deforming them at random, by inserting invalid characters, unusually large inputs, unusually nested inputs, etc, increasing the pathology of the input all the time - generally, an excellent idea.

The first question I have is, why not do these things more systematically than 'randomly'? If you can spit out test cases and order them by approximate degree of pathology, then they are easily split up into separate classes, and easy candidates for testing using distributed methods. Fuzz testing as a methodology is in the open, and I can certainly imagine blackhats harnessing their zombied botnets to find new vulnerabilities. Microsoft has enough money to buy clusters of clusters, and I'm sure Mozilla and Opera could leverage (ugh!) their incredible goodwill to start huge autotesting projects. This would increase the 'depth' of any fuzz test.

But how would you increase the 'breadth' of a fuzz test? For instance, take the Python version of mangleme above. It certainly mangles the values, but not the attributes or tags themselves. Now, you may say that these are a tightly controlled, already tested, small set of cases... but the whole point of fuzz testing is to increase your ability to test all input! THIS, if anything, is where I would use 'randomness' - perhaps a genetic algorithm could mutate characters at random?

Real world online error reporting is really just glorified low-level fuzz testing. On large projects this usually requires statistical analysis before it can be used to prioritize bugfixes.

I think there could be really interesting challenges when fuzz testing across applications!

Automated Pinpointing

I couldn't find a better name for this, but this is where you have a bug caused by a very complicated scenario, and you want to narrow it down to the 'minimal' test case. Here, you tell your test program what the 'symptoms' of the bug are, and it will systematically reduce input in different (possibly random) ways, stopping when it is reasonably confident it has shrunk the problem down.

Automatic Debugging aka Code Checking

There is a plethora of tools which can be used on the codebase itself, actively finding problems before or during compilation rather than a binary reacting passively to inputs. One place where I found lots of research and documentation (designed for open source projects, but definitely more widely applicable) was the Berkeley Open Source Quality Group. Microsoft has released a tool called PreFast [PowerPoint file]. This goes through every possible execution path in every function to find possible errors.

Simple stuff to manage data types and check bounds 'locally' has been around for a while. The real kicker is analyzing as you move down code paths. At this point more than any other, I would like to stress that I am not a coder of any sort!

This is hard, so a really good first step is to simplify the problem as much as possible. One way is to try to separate out, as much as possible, sections of code which provably (or as close enough as is practical) don't touch each other at all, and then work with each section individually, ignoring all 'known good' code. However, in practice this means separating out only those bits which don't rely on global information. This is why PreFast uses function by function analysis, at the cost of not being able to work on certain dynamic problems.

Another big issue with such autodebugs is that they can flag a lot of false positives. This is why one area being researched is refining each possible failure to a 'minimal counterexample', and then trying to show by brute force that such a minimal counterexample does not exist. I suggest Bayesian filtering ;-)


I'm getting far far over my head with this, so I'll leave it here. Anyway, the point is obvious - that automated testing is very important and becoming increasingly important.

This is a document in progress, please leave messages with suggestions!


At 1:23 AM, Blogger Jesse Ruderman said...

Automated Pinpointing is also known as Delta Debugging.

At 7:36 AM, Blogger vente said...

Nice blog. Have you seen your google rating? BlogFlux It's Free and you can add a Little Script to your site that will tell everyone your ranking. I think yours was a 3. I guess you'll have to check it out.

Computer News
Yahoo Boasts Size of Its Search Engine Index

Trying not to include any phallic analogies, Yahoo this week announced that its overall search engine index is much larger than Google’s and is the most in depth index of ‘web objects’ on the search market. On the Yahoo Search Blog, Yahoo disclosed that its index now includes 19.2 billion web documents, 1.6 billion images and more than 50 million audio and video files - over 20 billion items.

Yahoo is usually shy about disclosing the size of its search index, but the Yahoo Search Blog is celebrating its first year anniversary and Tim Mayer thought that somewhat of a retrospect was in order - since Yahoo has grown into its own as a search engine powerhouse over the past 365 days.

From the YSearchBlog : While we typically don’t disclose size (since we’ve always said that size is only one dimension of the quality of a search engine), for those who are curious this update includes just over 19.2 billion web documents, 1.6 billion images, and over 50 million audio and video files.

Note that as with all index updates we are still tuning things so you’ll continue to see some fluctuation in ranking over the next few weeks.

Greg Sterling of the Kelsey Group, however, makes the distinction of quality over quantity What I, Joseph User, care about is accuracy, quality and relevance. The available index does matter in terms of bringing me a sufficient quantity of results. (And if I’m looking for something really obscure, having that thing in the index is obviously important, which may go to size.).

But there’s a major case of diminishing returns—there’s already way too much information online for people to assimilate. Throwing more volume at me does nothing but make my eyes glaze over. What I want is enough relevant results.”

Index schmindex, the moral of the story is what Yahoo has accomplished over the past year and what the next 12 months will bring with not only Yahoo Search, but the Yahoo Publishers Network, Yahoo LinkSpots, Yahoo Pay Per Call, and Site Explorer. What has Yahoo accomplished over the past year? Well, here’s Tim’s rundown :
Copyright © - 2005 Entireweb



Post a Comment

<< Home