Matrix multiplication benchmarks, are simple"looping algorithms, which can be quickly programmed by a student using a high-level programming language such as C++ matrix multiplication algorithm coded one of our students in C++ is shown in Figure 1 The algorithm's primary functionality is expressed in a single line of C++ code, line 52, which performs the computation of one(l) multiplication and one(1) addition. This line of code is repeatedly looped through all the rows and columns of the input matrices, a and b, to compute elements in the resultant matrix, c[5 IV. COMPUTATIONAL REQUIREMENTS OF THE MATIRx MULTIPLICATION ALGORITHM Given the highly predictable, well understood nature of the multiplication algorithm, in which there are N matrix multiplications and N-1 additions for each of the N elements of the resultant matrix, it is possible to accurately estimate, a priori, the total number of arithmetic operations required to multiply a certain sized pair of matrices, and the corresponding number of assembly language instructions the would be executed on the processor hardware. For equally dimensioned square matrices of size N x N, the total number of arithmetic operations(either addition or multiplication) needed to compute the multiplication of the matrices is 2.N, for large values of N Since line 52 of the C++ code contains two arithmetic operations, it needs to be executed N times by the benchmark of two x matrices looping controls to evaluate the product N N High-level C++ code cannot run directly on the microarchitecture, but must be compiled into an assembly and finally, machine code format to execute natively on the computer's hardware A good estimate of the number of assembly instructions and, hence, machine code instructions created by the compiler for line 52 of the matrix multiply C++ code on an x86 architecture processor can obtained from an assembly language pseudo code analysis, or, by just reviewing the assembly code listing actually generated by the compiler for the program