|
As I write this on a cold November 2nd morning in the UK an inquiry is under way after a memory stick with user names and passwords, used in testing a key UKGovernment computer system, was discovered in a pub car park. The Mail on Sunday said ministers have ordered an emergency shutdown of the Gateway website, which covers anything from tax returns to parking tickets, while experts checked to ensure people's private details were not compromised.
Here in the UK we have had a string of such news stories involving the loss of data from government departments, hospitals, the military and IT contractors, and whilst it is unacceptable for this to happen in any case, in some cases I can at least understand how it happened. If you are working from home on patient records for example, I can understand that you might have live records on your laptop or memory stick. What I cant understand is, why testers would use live data for testing, unless of course they are either, lazy, criminal, technically challenged or ignorant.
Ok, I know half of you are now spitting mad at the last statement, so let me explain my reasoning for the accusations I have just levelled at my fellow testers.
First, I know the normal data creation process as I have used it myself countless times, (but no more). It goes something like this.
Days spent analysing the requirements.
Weeks spent writing test cases
A couple of minutes on the phone to ask for a subset of live data for testing to be dumped into our environment.
So….
Accusation number 1; You Must Be Lazy.
Live data must be the best for testing, right? Wrong!
Live data is of course a good representation of the data used day to day, but testing is about finding bugs, right? We don’t want common data, we want uncommon data, we want data that populates the wilderness edges of the system, processes and transactions. So whilst the live database may well be the right place to start the test data creation process, a dump of live data is not a good place to end it.
If your solution to test data is simple to take a dump of live, and off you go, then that is just plain lazy. You need to finesse the data to ensure that it will exercise the break points and boundaries. You need to ensure that it has a rich data source that has both breadth (covers all the databases, tables and fields) as well a depth, exercises all dependencies and covers all business rules.
Accusation number 2; You Must Be Criminal.
More and more legislation is being introduced around the world concerning the security and handling of personal data. It’s easy to think that if you are in testing, these laws don’t apply to you, it’s just for when we go live. Again this is wrong. If you are not working within the confines of the relevant law, you could find yourself on the wrong end of a criminal investigation.
Here in the UK for example personal data is covered by the Data Protection Act, and amongst other thinks that states that data can only be used for the purpose for which it was collected. If you don’t have testing systems down on that list, it might well be a beach if data is used for testing. In addition, the majority of data security breaches reported in the press over here, appear to be internal breaches, either staff taking data off site, or inappropriate staff having access to personal data. I have worked for an organisation that handles sensitive data. All the operators have to have a police check before they can be employed to work on the system. The company that supplies the system was using raw live data for testing, and none of the testers, who were mainly contractors, had been police checked.
We have a duty to secure the data we use, test or otherwise. If testing can be done without unaltered live data, then I would argue we should not be using raw live data. To ensure you are on the right side of the law, maybe you should have a test data security policy and process that covers
1: Data preparation, who can access live data, and anonymise it for test?
2: Fake data library: A data library of fake names and addresses so they don’t have to be created every time and can be reused over and over.
3: Configuration Management: Ensure that the test system data is secured and treated as if it was live data.
4: Security policy: Ensure that the same security policy that applies to live systems and staff applies to test systems and staff.
Accusation number 3; You Must Be Technically Challenged.
Often as testers we just don’t understand the data, or have the skill sets to create what we want, we rely on the development team or DBA’s to give us what they think we need. If we do create data, then it tends to be focused on just that needed to execute a particular test, leading to a very narrow and restricted set of data.
The days of single flat data bases are long gone, and we are almost always faced with having to address hundreds or thousands of data elements, multiple databases, complex business rules, and an unlimited combination of routes and inputs. To create a rich anonymous data set, that maintains referential integrity now needs a tool. As testers we need to be able to take a simple sample of data, analyse it, anonymise it, subset it, expand it, and randomise it. Once you have done all that, you need to be able to manage it.
The good news is that more and more tools vendors are now offering this sort of capability, and my advice to those looking for the next hot skill set in testing is to get trained up in a test data generation tool ASAP. A number exist, and some are quite costly, however for a free accreditation training course of a great data management tool, check out, DATAMAKER http://www.grid-tools.com/accreditation.php. ( No, I don’t get any money for recommending them, just having met them and seen the tool, think it has a lot to offer).
Accusation number 4; You Must Be Ignorant
If you are still blindly progressing testing by taking a cut of live, slapping it on the test system and tinkering at the edges, then not only are you missing the point of testing ( do the things that are most likely to find the bugs), you may well be setting your company up to be the next big ‘Security Blunder at XXX Corp’ headline. Ignorance may be bliss, but it is no defence. I can only assume that you really are ignorant of current trends around making companies account for how they handle our personal data.. You need to get educated, and fast;
Take some time to have a look on the internet on the many good quality, free, whitepapers. Try searching on, ‘Generating Test Data’, ‘Good Test Data’, ‘Test Data Security’ and ‘Test Data Management’.
Look for a conference or seminar near you in the next few months that covers this subject and sign up.
Read the IT and national press for coverage on data blunders and get familiar with the consequences.
Tony Simms is the principal consultant
at Roque Consulting and can be contacted via Tony@roque.co.uk
|