Validate reporting result using sampling method

Feature points sampling method validating reports

Sampling method is the most efficient and popular data validation method, even it has little coverage with lots of risk. Here I would introduce an feature points sampling method. The reason to add “feature points” is to avoid generic random points that may not cover as much as possible for the report validation.

Feature points sampling method takes some special points and generic points to validate a reporting result, and ensure these points have most coverage of the reporting range. Feature points contains less data, and could be validated by human calculation easily. The results of these points can validate the query logic correctness. Once the query logic being validated we could add more complex data combination as generic points and finally compare the query results and report results.

Let’s take a real automation report as an example. The data come from the log of BluePrism RPA. The existing ETL process has been tested already. I am going to focus on the report result against local data to verify the report result correctness.

The report includes daily, weekly, monthly, and yearly. In this article I’m gonna use monthly report for validation. Here’re the points used in the verification.

Point StartMth EndMth LOB Process Point Note
1 2018-10 2018-10 BM hotspots feature specific lob and process_name at certain time point
2 2018-10 2018-10 BM Shipped not Activate feature different specific lob and process_name at certain time point
3 2018-10 2018-10 BM [ALL] generic same lob and time point, but all process_names
4 2018-11 2018-11 BRS Disconnection feature different specific lob and process_name and different time point
5 2018-11 2018-11 BRS Field Work feature different specific lob and process_name and different time point
6 2018-09 2018-11 [ALL] [ALL] generic all lob and process_name at different time points, more generic
7 2017-12 2018-02 [ALL] [ALL] generic cross year, more generic

You could test more different feature points and generic points, however, with the same logic behind in most situation you would get the same results.

To same space, I only list point 1 and 7 in this article.

Point 1.


select lob,PROCESS_NAME,count(loaded) sendto,count(completed) worked,1.0*count(completed)/count(loaded) [% automated]
from tbl_AutomationScorecard
	lob='BM' and 
	PROCESS_NAME='hotspots' and
	month(loaded)=10 and year(loaded)=2018
group by lob,PROCESS_NAME

Query result:

point01 query result

Report result:
point01 report

Point 7 (generic point, cross year)


421 total views, 1 views today

Author: Albert

Leave a Reply