Problem 7
From CSI 702
Contents |
1. Problem 7
1.1. Description
How can this code be written more efficiently?
found = .false. do i = 1, large_number if (x(i) == target_value) found = .true. enddo
1.2. Expectation
The loop continues to run until the maximum iterations designated by large_number has been reached regardless if target_value has been found earlier in the array. Inserting a statement to exit out of the loop once the target_value has been found should speed up the code.
1.3. Code
The simplest way to optimize this code is to add a break
#include <stdio.h>
#define LARGE_NUMBER 1000000
#define TARGET 500000
#define LOOP 10
/* ## inline function to be called multiple times ## */
void inline algorithm( void );
/* ## MAIN FUNCTION ## */
int main (int argc, char const *argv[])
{
int j;
for(j = 0; j < LOOP; j++) {
algorithm();
}
return 0;
}
/* ## inline function to be called multiple times ## */
void inline algorithm( void ) {
/* set up initial */
int target_value = 2; /* target value */
int x[LARGE_NUMBER]; /* array to search */
int i; /* iterative index for looping */
/* first fill the array with 1's as a test set */
for (i = 0; i < LARGE_NUMBER; i++) {
x[i] = 1;
}
/* set the target to two */
x[TARGET] = 2;
/* iterate through and if we match break out of loop */
for(i = 0; i < LARGE_NUMBER; i++){
if (x[i] == target_value) {
printf("\nFound Target! @ %d\n", i);
break;
}
}
}
However, for a small initial penalty, we can make this even faster by checking and replacing the final element of x(i) with our target.
/* ## inline function to be called multiple times ## */
void inline algorithm( void ) {
/* set up initial */
int target_value = 2; /* target value */
int x[LARGE_NUMBER]; /* array to search */
int maxsize = LARGE_NUMBER -1; /* adjusted maximum index */
int y; /* temporary value of the arrays last
index */
int i; /* iterative index for looping */
/* first fill the array with 1's as a test set */
for (i = 0; i < LARGE_NUMBER; i++) {
x[i] = 1;
}
/* set the target to two */
x[TARGET] = 2;
/* check max size */
if (x[maxsize] == target_value) {
printf("\nFound Target! @ 0\n");
}
else {
/* save our initial last value */
y = x[maxsize];
/* set our last value to our target */
x[maxsize] = 2;
/* find our target */
for(i = 0; x[i] != target_value; i++);
/* if we find the index is less than maxsize then we found what we were
looking for */
if (i < maxsize){
printf("\nFound Target! @ %d\n", i);
}
/* return our final value back to it's original state */
x[maxsize] = y;
}
}
In Fortran:
program hw1p7 integer true_value_index, k true_value_index = 25000 do k = 1, 1000 call algorithm1(true_value_index) call algorithm2(true_value_index) enddo end
Original code:
subroutine algorithm1(true_value_index) integer true_value_index, x(100000), i, j logical found do i = 1, 100000 x(i) = 1 enddo x(true_value_index) = 2 found = .false. do j = 1, 100000 if (x(i) == 2) found = .true. enddo end
Modified code with an exit clause for the loop:
subroutine algorithm2(true_value_index) integer true_value_index, x(10000), i, j logical found do i = 1, 100000 x(i) = 1 enddo x(true_value_index) = 2 found = .false. do j = 1, 100000 if (x(i) == 2) found = .true. exit enddo end
1.4. Comments
Here are the timings for 5000 loops at targets located at position 250,000, 500,000, and 750,000 with no C flag optimizations.
| 750,000 | 500,000 | 250,000 | |
|---|---|---|---|
| Unoptimized | real 31.37 / user 31.21 / sys 0.06 | real 31.41 / user 31.28 / sys 0.06 | real 31.29 / user 31.25 / sys 0.06 |
| Optimize 1 | real 27.42 / user 27.32 / sys 0.04 | real 23.81 / user 23.68 / sys 0.05 | real 20.08 / user 19.95 / sys 0.04 |
| Optimize 2 | real 27.79 / user 27.55 / sys 0.07 | real 23.81 / user 23.68 / sys 0.06 | real 20.41 / user 20.32 / sys 0.04 |
Here are the timings for 5000 loops at targets located at position 250,000, 500,000, and 750,000 with the "-O3"" C flag optimization.
| 750,000 | 500,000 | 250,000 | |
|---|---|---|---|
| Unoptimized | real 0.00 / user 0.00 / sys 0.00 | real 0.00 / user 0.00 / sys 0.00 | real 0.00 / user 0.00 / sys 0.00 |
| Optimize 1 | real 13.87 / user 12.98 / sys 0.09 | real 11.43 / user 11.36 / sys 0.03 | real 10.19 / user 10.12 / sys 0.03 |
| Optimize 2 | real 12.66 / user 12.55 / sys 0.05 | real 11.43 / user 11.36 / sys 0.03 | real 10.27 / user 10.19 / sys 0.04 |
There are a couple of issues to note. The first is that the concept of only checking the last item, instead of the iteration of the algorithm seems to do little to speed up the process, and in the unoptimized version actually slows it down. The second is that it appears the compiler can determine the location of the target as static and simply return the value. In the discussion portion the output of a series of times for different -Ox's is provided, but there is no real improvement or change from -O3. The gprof values were calculated, but showed little difference from the recorded time values but are provided as reference:
Fortran results are as follows with optimization flag set to zero and the True Value Index set to 25K and 50K respectively:
Note that the timings were very similar for 25K and 50K. Algorithm 2 was more efficient and became faster than algorithm 1 as the iterations increased.
Fortran results with True Value index set at 75K:
There was not a consistent efficiency perceived between algorithm 1 and 2. This is expected as algorithm 2 reduces to looping through the entire array as the true value index nears the end of the array.
1.5. Contributors
Jonathan Lisic, H. Le


