This is an answer for an interview i appeared.The question was to optimize below query..Unfortunately their site didn’t allow special characters ,so i had to post the answer on my blog and provided link to them.I felt happy about this answer,so it thought of leaving it like this..
Few issues i could see with above query
1.Where clause is not sargable
so rewriting below clause in query like this
cast(main.c as date) >=cast(getdate()-2 as date)
instead of this
DATEDIFF(Day, main.c, getdate()-2) <=0 and DATEDIFF(Day, sub.c, getdate()-2) <=0
helps when we have an index,same is the case with test_sub table where clause
2.UDF executes per row,we can’t avoid this,so instead we have to focus on query reqrite which limits the rows
3.below indexes helps the query
create index nci_KC_c_Ic_a_b_e
on test_main(c)
include(a,b,e)
create index nci_KC_c_Ic_a_b
on test_sub(c)
include(a,b)
Even after adding this indexes,table test_main returns 4000 rows but scans all the table due to this clause
cast(SUBSTRING(e,6,5) as int)
between 1000 and 5000
adding a filtered index would be of much help,but i cant add due to its limitations,so i added computed column like below
alter table test_main
add e_col as cast(SUBSTRING(e,6,5) as int) persisted
now with this approach,i got only required rows and i am not paying cost for 50000 reads ,i am paying cost only for 4001 reads
Below are the total queries i ran
alter table dbo.test_main
add e_col as cast(SUBSTRING(e,6,5) as int) persisted
create index nci_KC_e_Ic_a_b_c
on test_main(e_Col)
include(a,b,c,e)
create index nci_KC_c_Ic_a_b
on test_sub(b)
include(a,c)
now the query
IF (SELECT sum(b) from test_main where a in (select b from test_sub where DATEDIFF(Day, c, getdate()-2) <=0 )) > 0
select main.c,
dbo.UDF_exists(main.a) as exists_ ,
e,
SUBSTRING(e,6,5)as sequence_ ,
cast(1000/sum(main.b) as decimal(18,2)) as divide_
from
test_main main
join
test_sub sub
on main.a=sub.b
where cast(main.c as date) >=cast(getdate()-2 as date)and cast(sub.c as date) >=cast(getdate()-2 as date)
and e_Col >=1000 and e_col<=5000
group by main.c,main.a,main.e,e_col
Else
select null,null,null,null
thats pretty much it