the reason there are 2.5 EC2 Unit on some is due to the underlying processor and core setup that is allocated when you request that type of instance, specifically the GHz of the CPU. My suspicion is the higher cost instances have newer CPUs which create the 0.5 units extra compared to older CPUs upon which the EC2 Unit was originally based.
The EC2 Unit is best explained this way:
check out /proc/cpuinfo - on an m1.small which has an advertised 1 ECU the result is 1 Core. The last time I checked it was an Intel Xeon E5430 @ 2.66GHz. Now, this is not the end of it though - because as AWS says m1.small=1 EC2 Unit = ~ 1-1.2Ghz processor time, so if this is correct, we'd expect to never be able to use more than about 45-50% of the CPU stated in /proc/cpuinfo.
To verify that this is the case, create a simple program that does CPU intensive work with no I/O (we're trying to isolate the CPU, don't introduce other bottlenecks) and staying within the RAM limit.
If you do this and run "top" while your program is running, you'll see that the %st (stolen CPU) is about 55%. Top measures stolen as the part of the CPU clock time not usable by you (in reality its not stolen, its the other tenant on the terrestrial machine who is paying for it, or is reserved for a new instance request). This means on the overall CPU, you get about 45% to use on an m1.smal. 45% of 2.66 = 1.19Ghz which is about what is advertised.
On instances with 2 cores with 1 EC2 unit each you get the added benefit of parallelism, but with only 1 EC2 unit on each, you'll see the same behavior described above occurring on each core.
SO, given this behavior, getting more EC2 units per core will allow a single CPU intensive process to do more work over a given time interval (as it gets more of the total CPU core wall clock time). Having more cores allows you to do more things in parallel which is better when the machine has multiple processes.
In this way, the EC2 Unit is a way to standardize the "throttling" of your use of a given core to 1-1.2Ghz. Double the throttle, to 2 ECU units on a given core and get 2-2.4Ghz of the total wall clock time.
In our real world test situation, our test process was capable of running on only 1 CPU at a time (so multi-cores were not helpful) took 31 minutes to complete on an m1.small = 1 EC2 Unit. On a c1.medium with 2.5 ECU units the same program took a bit over 12 minutes. =30/2.5=12 minutes!