Problem 1
From CSI 702
Contents |
1. Problem 1
1.1. Description
How do I make this code run faster?
do i = 1, 1000
if (i < 100) then
a(i) = 10
else
a(i) = 20
endif
enddo
1.2. Expectation
The expectation is that if the if statement were to be removed from the loop, the code should run a bit faster. Please note that branches in loops break vector pipelines. This can be done by breaking the original code into two loop statements. In doing so, we can eliminate the need for the branching logic. It is not expected that this change will improve the order of the method from O(C*n).
1.3. Code
Original code in Fortran:
program hw1p1v1 integer j do j = 1, 50000 call algorithm enddo end
subroutine algorithm()
integer i, a(1000)
do i = 1, 1000
if (i < 100) then
a(i) = 10
else
a(i) = 20
endif
enddo
end
can also be written as
subroutine algorithm()
integer i, a(1000)
if (i < 100) then
do i = 1, 1000
a(i) = 10
enddo
else
do i = 1, 1000
a(i) = 20
enddo
endif
end
In the above subroutine, the if test is evaluated only once.
Breaking the original loop into two loops to remove the if statement in the subroutine gives:
subroutine algorithm2() integer i, a(1000) do i = 1, 99 a(i) = 10 enddo do i = 100, 1000 a(i) = 20 enddo end
Original version in C:
#include <stdlib.h>
int allocatemem(int n,int **a){
(*a) = (int*)malloc(n*sizeof(int));
return 0;
}
int makeloop(int n, int m, int* a){
int i;
for (i=0;i<n;i++){
if (i < m)
a[i] = 10;
else
a[i] = 20;
}
return 0;
}
int main(int argc,char *argv[]){
int n=atoi(argv[1]);
int m=n/10;
int *a;
allocatemem(n,&a);
makeloop(n,m,a);
free(a);
return 0;
}
Modified version in C:
#include <stdlib.h>
int allocatemem(int n,int **a){
(*a) = (int*)malloc(n*sizeof(int));
return 0;
}
int makeloop(int n, int m, int* a){
int i;
for (i=0;i<m;i++)
a[i] = 10;
for (i=m;i<n;i++)
a[i] = 20;
return 0;
}
int main(int argc, char *argv[]){
int n=atoi(argv[1]);
int m=n/10;
int *a;
allocatemem(n,&a);
makeloop(n,m,a);
free(a);
return 0;
}
1.4. Comments
The figure shows the result timing vs. the cycles quantity n of each version of the program. Time is the output of gprof for the makeloop subroutine. Both version, original and modified, were compiled with the flags -O0 and -O2. It should also be noted that execution times varied wildly depending on the number of executions of the subroutine. This is presumably due to caching from the previous run.
1.5. Contributors
H. Le
Marcelo Raschi
Zach Firth
Ash Agarwal

