Reporting notes 01

Working for over 10 years in IT technology, I realized that lots of experiences were lost with my memory when time passed by. So I wrote this note for keep my memory. If anyone is also interested in this matter, please contact me.
Problem: business requires an report in excel format to be automated in web application.

Final web format:
Original data format:


Key points:

  • Data must be summarized by date, week, month, year, and job name dimensions.
  • Fact table contains dozen millions of records for several years;
  • The reporting requires to rotate the data by 90 degrees to display summarized data
  • Multiple data presentations in a single column, witch includes integer number, decimal, and percentage.

Solution:

    1. Get data from data source
      OpenQuery
      for linked server does not accept variables, therefore we have to use dynamic
      query statement.
      set
      @openquery =
      SELECT  [Date],
              [Day],
              JobName,
              Volume_to_Work,
              Volume_Worked,
              Volume_RPC
      FROM OPENQUERY([S1],”
         SELECT cast(a.Date as Date) [Date],
              datename(dw,date) as [Day],
              a.JobName,
              a.Volume_to_Work,
              a.Volume_Worked,
              isnull(b.Volume_RPC,0)
      as Volume_RPC
         FROM
         (
              –VOLUME
              select BACKUPDATE
      date,JobName,sum(VOLUME_SENT) as Volume_to_Work,sum(VOLUME_WORKED) as Volume_Worked
              from TBL_METRICS_RAPPORTVOLUME
              where JobName in (””’+
              replace(
              case @JOB_NAME_LIST when ‘All|’
      then ‘Ucm_Arics_ON_Preview|Ucm_Caprice_QC_Preview|UCM_OB_E_PREVIEW|UCM_OB_F_PREVIEW’
              else @JOB_NAME_LIST end
              ,‘|’,””’,””’)
              +””’
              )
              and year(BACKUPDATE)
      = ‘
      +cast(@REPORT_YEAR as varchar)+
              group by JobName,BACKUPDATE
         ) a
         left join
         (
              –RPC
              select Backupdate,session,count(session) as Volume_RPC
              from TBL_ACCOUNTS_WORKED a
              left join ( select
      RPC,Result from TBL_REFERENCE_DIALERCODES where site = ””Internal”” ) b on a.RELEASECODE=
      b.RESULT
              where session in (””’+
              replace( case @JOB_NAME_LIST when ‘All|’ then ‘Ucm_Arics_ON_Preview|Ucm_Caprice_QC_Preview|UCM_OB_E_PREVIEW|UCM_OB_F_PREVIEW’
              else @JOB_NAME_LIST end
              ,‘|’,””’,””’)
              +””’
              )
              and
      year(Backupdate) = ‘
      +cast(@REPORT_YEAR as varchar)+
              and b.RPC = 1
              group by
      Backupdate,session
         ) b on a.Date = b.Backupdate
      and a.JobName = b.session
         where datename(dw,date)
      <> ””Sunday”””
      )’
        
      create table #campaign ([Date] date,[Day] varchar(20),JobName varchar(255),Volume_to_work int,Volume_worked int,Volume_RPC int)
      insert
      #campaign exec(@openquery)
      generated openquery look like:
      SELECT
      [Date],
              [Day],
              JobName,
              Volume_to_Work,
              Volume_Worked,
              Volume_RPC
      FROM OPENQUERY([BREPP1],’
              SELECT cast(a.Date as Date) [Date],
                     datename(dw,date) as [Day],
                     a.JobName,
                     a.Volume_to_Work,
                     a.Volume_Worked,
                     isnull(b.Volume_RPC,0)
      as Volume_RPC
              FROM
              (
                      –VOLUME
                      select
      BACKUPDATE date,JobName,sum(VOLUME_SENT) as Volume_to_Work,sum(VOLUME_WORKED)
      as Volume_Worked
                      from [Source_Dialer].[dbo].[TBL_METRICS_RAPPORTVOLUME]
                      where JobName in (”All”,”Ucm_Arics_ON_Preview”,”Ucm_Caprice_QC_Preview”,”UCM_OB_E_PREVIEW”,”UCM_OB_F_PREVIEW”
                      )
                      and year(BACKUPDATE) = 2017
                      group by JobName,BACKUPDATE
              ) a
              left join
              (
                      –RPC
                      select Backupdate,session,count(session) as Volume_RPC
                      from Source_Dialer.dbo.TBL_METRICS_ACCOUNTS_WORKED a
                      left join ( select RPC,Result from
      Source_Dialer.dbo.TBL_METRICS_REFERENCE_DIALERCODES where site = ”Internal”)
      b on cast(a.RELEASECODE as varchar)= b.RESULT
                      where session in (”All”,”Ucm_Arics_ON_Preview”,”Ucm_Caprice_QC_Preview”,”UCM_OB_E_PREVIEW”,”UCM_OB_F_PREVIEW”
                      )
                      and year(Backupdate) = 2017
                      and b.RPC = 1
                      group by Backupdate,session
              ) b on a.Date = b.Backupdate and a.JobName = b.session
              where datename(dw,date) <> ”Sunday”’
      )
    2. Get summarized data

      select
         row_number() over(order by d.[JobName],d.date) as Id
         ,d.[JobName]
         ,year(d.[Date]) [Year]
         ,right(cast(d.DateISO as varchar),6) + cast(d.DayOfWeek as varchar) [Weekday]
         ,d.[Month]
         ,datepart(wk,d.[Date]) Week
         ,d.[Date]
         ,sum(Volume_to_work) Inventory
         ,sum(Volume_worked) Worked
         ,sum(Volume_RPC) RPC
      into
      #view_campaign
      from
      #campaign c
      right join (
         select [JobName],cd.[Date],cd.DateISO,cd.DayOfWeek,cd.month
         from (select distinct [JobName] from
      #campaign) c
         full outer join dbo.tbl_dim_date cd on 1=1
         where cd.year=@PRM_REPORT_YEAR
      ) d on d.[JobName]=c.[JobName] and c.[Date]=d.[Date]
      group by
           d.[JobName]
           ,year(d.[Date])
           ,d.[Month]
           ,datepart(Wk,d.[Date])
           ,d.[Date]

           ,right(cast(d.DateISO as varchar),6) + cast(d.DayOfWeek as varchar)

    3. Summarized by week, month, year for each jobname, use rollup to get weekly, monthly total columns in final report
    4. Remove duplicated week that cross months
      –Sum the value of the week that cross months (Line38 + Line 44)
      update
      #tmp set Inventory=t.Inventory, Worked=t.Worked, rpc=t.rpc
      from #tmp v join
      (
         select JobName,Agg
               ,sum([Inventory])[Inventory]
               ,sum([Worked])[Worked]
               ,sum([RPC])[RPC]
         from #tmp
         where Agg like ‘w%’
         group by Agg,JobName
      ) t on t.Agg=v.[Agg]
      where v.agg=t.agg and v.JobName=t.JobName
      Delete (Line38) the value of the first week of the duplicated week
      delete from #tmp where agg+cast([Month] as varchar) in
      (select agg+cast(min([Month]) as varchar)
       from #tmp
       where agg like ‘w%’
       group by JobName,agg
       having count(agg)>1
      )

       

    5. Add calculated columns: CompleteRate, ReachRate
      They are the result of division, so dot not use avg to get wrong
      result
         ALTER TABLE #tmp add
      CompleteRate decimal(14,4),ReachRate decimal(14,4)
         update #tmp
      set
      CompleteRate = case [Inventory] when 0 then 0 else [Worked]*1.0/[Inventory]
      end,
         ReachRate = case [Inventory] when
      0 then 0 else [RPC]*1.0/[Inventory]
      end

         select * from #tmp

       

    6. Generate dynamic columns definition and header columns
      –@colDesc is for table header, # followed by datatype is for display formatting
      select
      @col += ‘,’ + quotename(agg)+‘ varchar(15)’
      from #tmp
      where
      [year]=@REPORT_YEAR and
      JobName=‘All’
      select
      @colDesc += ‘,[‘
      + dbo.[ufn_GetCampaignHeader](agg)+‘#string] varchar(15)’
      from #tmp
      where
      [year]=@REPORT_YEAR and
      JobName=‘All’
      select
      @col = stuff(@col,1,1,), @colDesc = stuff(@colDesc,1,1,)

      The value of the variable @col =[YTD<br>2017] varchar (15),

      [Month1]     varchar (15),
      [2017-01-01] varchar (15),
      [2017-01-02] varchar (15),
      [2017-01-03] varchar (15),
      [2017-01-04] varchar (15),
      [2017-01-05] varchar (15),
      [2017-01-06] varchar (15),
      [2017-01-07] varchar (15),
      [Wk1]       varchar (15),
      [2017-01-08] varchar (15),

       

      [2017-01-09] varchar (15),
      ……
      [2017-12-28] varchar (15),
      [2017-12-29] varchar (15),
      [2017-12-30] varchar (15),
      [Wk52]       varchar (15),
      [2017-12-31] varchar (15),

      [Wk53]       varchar (15)The value of the variable 

      @colDesc =[YTD<br>2017#string]
      varchar(15),

      [Jan#string] varchar(15),
      [Sun<br>170101#string]
      varchar(15),
      [Mon<br>170102#string]
      varchar(15),
      [Tue<br>170103#string]
      varchar(15),
      [Wed<br>170104#string]
      varchar(15),
      [Thu<br>170105#string]
      varchar(15),

       

      [Fri<br>170106#string]
      varchar(15),
      ……
      [Fri<br>171229#string]
      varchar(15),
      [Sat<br>171230#string]
      varchar(15),
      [Wk52#string]          varchar(15),
      [Sun<br>171231#string]
      varchar(15),

       

      [Wk53#string]          varchar(15) 

    7. Create temp summary table to store pivoted final data
      –table tmp_summary, add spid value in the global temp table name
      to isolate concurrent users
      For using dynamic sql query we have to use global temp table, or
      we are unable to access the temp table.
      Seq (sequence) column is for displaying data in order for each job
      name
         declare
      @tmptable varchar(1000)
         select
      @tmptable = ‘##tmp_summary_’
      + cast(@@SPID as varchar)
         exec(‘if object_id(”tempdb..’
      + @tmptable + ”’) is not null drop table ‘ +
      @tmptable)
         exec(‘create table ‘ + @tmptable + ‘ ([!Year] int,[JobName#string] varchar(max),[!Seq]
      int,[Campaign#string] varchar(max),’
      +
      @colDesc + ‘)’)
      –result:
      create table tmp_summary_51 ([Year] int,JobId
      int,Seq int,JobName varchar(max),[Year2017 Total] varchar(20),[Month1 Total]
      varchar(20),[2017-01-01] varchar(20),[2017-01-02] varchar(20),[2017-01-03]
      varchar(20),[2017-01-04] varchar(20),[2017-01-05] varchar(20),[2017-01-06]
      varchar(20),[2017-01-07] varchar(20),[Week1 Total] varchar(20),[2017-01-08]
      varchar(20),[2017-01-09]
      ……
      varchar(20),[2017-06-23]
      varchar(20),[2017-06-24] varchar(20),[Week25 Total] varchar(20),[2017-06-25]
      varchar(20),[2017-06-26] varchar(20),[2017-06-27] varchar(20),[2017-06-28]
      varchar(20),[2017-06-29] varchar(20),[2017-06-30] varchar(20),[Week26 Total]
      varchar(20))
    8. Turning table in 90 degree with pivot
      declare
      @job varchar(255)
      declare
      csr cursor FAST_FORWARD
      for select Name
      from @tbl_job
      open csr
      fetch next from csr into @job
      while @@fetch_status = 0
      begin
         –get column list
         if @columns is null
                select
      @columns = replace (@col,‘ varchar (15)’,)
        
         –rotate
         –inventory
         set @sql =‘insert into ‘ + @tmptable +
         select Year, JobName, 1 seq,
      Campaign,’
      + @columns +
         from (
                select
      JobName,JobName+” Inventory” Campaign,[year],[agg],Inventory
                from #tmp
                where JobName = ”’ +
      @job + ”’
                ) x
         pivot (
                sum(Inventory) for
      [agg] in (‘
      +@columns+‘)
         ) as pvt
         –where [year] is not null
         order by [year], JobName, seq
         ‘
         exec(@sql)   
               
        
         –worked
         set @sql =‘insert into ‘ + @tmptable +
         select Year, JobName, 2 seq,
      Campaign,’
      + @columns +
         from (
                select JobName, JobName
      + ” Worked” Campaign, [year], [agg], Worked
                from #tmp
                where JobName = ”’ +
      @job + ”’
                ) x
         pivot (
                sum(worked) for [agg]
      in (‘
      +@columns+‘)
         ) as pvt’
         exec(@sql)
        
         –% of completion
         set @sql =‘insert into ‘ + @tmptable +
         select Year, JobName,3 seq,
      Campaign,’
      + @columns +
         from (
                select JobName,”% of
      Completion” Campaign, [year], [agg], CompleteRate
                from #tmp
                where JobName = ”’ +
      @job + ”’
                ) x
         pivot (
                avg(CompleteRate) for
      [agg] in (‘
      +@columns+‘)
         ) as pvt’
         exec(@sql)
         –RPC
         set @sql =‘insert into ‘ + @tmptable +
         select Year, JobName, 4 seq,
      Campaign,’
      + @columns +
         from (
                select JobName,”RPC”
      Campaign, [year], [agg], RPC
                from #tmp
                where JobName = ”’ +
      @job + ”’
                ) x
         pivot (
                sum(RPC) for [agg] in
      (‘
      +@columns+‘)
         ) as pvt’
         exec(@sql)
        
         –ReachRate
         set @sql =‘insert into ‘ + @tmptable +
         select Year,JobName,5 seq, Campaign,’ +
      @columns +
         from (
                select
      JobName,”Reach Rate” Campaign, [year], [agg], ReachRate
                from #tmp
                where JobName = ”’ +
      @job + ”’
                ) x
         pivot (
                avg(ReachRate) for
      [agg] in (‘
      +@columns+‘)
         ) as pvt’
         exec(@sql)
        
      fetch next from csr into @job
      end
      close csr
      deallocate
      csr
      pivoted data look like this:

      dynamic pivot statement:

    9. Rendered on web pageReport
      filter:
    10. 5 temp tables are used

      #campaign              (Date, Day, JobName, Volume_to_work, Volume_worked, Volume_RPC) –stores original data
      #view_campaign  (JobName, Year, Weekday, Month, Week, Date, Inventory, Worked, RPC)  –summarized data
      #tmp                       (JobName, Year, Month, Week, Agg, Inventory, Worked, RPC)                –target aggregation data
      ##tmp_summary_spid(Year, JobName, Seq, Campaign, YTD, Jan, Sun, Mon, Tue, … )      –final output data

 

  1. Further action:Stores pivoted data in a table with ETL process to improve the performance.
Note:


 

  • the data used in this note is test date, not real.

 

 

 



Advertisements

Leave a Reply