As I have mentioned, design and implementation can affect the efficiency characteristic of a transport protocol. This is mainly due to the impact of the CPU usage. For example, to transfer data at 1Gbps, about 80000 packets have to be processed in one second. If the per packet processing is too long, the throughput will be limited by CPU; if there is a CPU time burst in packet processing, then the incoming packets may overflow the UDP buffer.
We have used many techniques in the implementation according to these efficiency considerations, including memory copy avoidance, timer-based acknowledging, and so on.