Identify bottlenecks in the program:
Are there areas that are disproportionately slow, or cause parallelizable work to halt or be deferred? For example, I/O is usually something that slows a program down.
May be possible to restructure the program or use a different algorithm to reduce or eliminate unnecessary slow areas