What follows are collected numbers showing that the performance differences, after Jason Thorpe's recent commits, are quite noticeable:
Blowfish
Before:
Doing blowfish cbc for 3s on 8 size blocks: 2891026 blowfish cbc's in 2.97s
Doing blowfish cbc for 3s on 64 size blocks: 411766 blowfish cbc's in 3.10s
Doing blowfish cbc for 3s on 256 size blocks: 104721 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 1024 size blocks: 26291 blowfish cbc's in 2.98s
Doing blowfish cbc for 3s on 8192 size blocks: 3290 blowfish cbc's in 3.10s
type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
blowfish cbc 7787.28k 8755.16k 8936.19k 9034.22k 8954.05k
After:
Doing blowfish cbc for 3s on 8 size blocks: 4573792 blowfish cbc's in 3.10s
Doing blowfish cbc for 3s on 64 size blocks: 713440 blowfish cbc's in 2.99s
Doing blowfish cbc for 3s on 256 size blocks: 183125 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 1024 size blocks: 46221 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 8192 size blocks: 5787 blowfish cbc's in 3.00s
type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
blowfish cbc 12156.26k 15270.96k 15626.67k 15776.77k 15802.37k
MD5
Before:
Doing md5 for 3s on 8 size blocks: 1490796 md5's in 3.00s
Doing md5 for 3s on 64 size blocks: 895849 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 410807 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 129416 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 17527 md5's in 3.00s
type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 3975.46k 19111.45k 35055.53k 44173.99k 47860.39k
After:
Doing md5 for 3s on 8 size blocks: 2041410 md5's in 3.00s
Doing md5 for 3s on 64 size blocks: 1345402 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 669827 md5's in 3.10s
Doing md5 for 3s on 1024 size blocks: 221744 md5's in 2.96s
Doing md5 for 3s on 8192 size blocks: 30685 md5's in 3.00s
type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 5443.76k 28701.91k 56968.68k 76711.44k 83790.51k
RMD160
Before:
Doing rmd160 for 3s on 8 size blocks: 778828 rmd160's in 3.00s
Doing rmd160 for 3s on 64 size blocks: 430214 rmd160's in 3.00s
Doing rmd160 for 3s on 256 size blocks: 182108 rmd160's in 3.00s
Doing rmd160 for 3s on 1024 size blocks: 55050 rmd160's in 3.00s
Doing rmd160 for 3s on 8192 size blocks: 7339 rmd160's in 3.00s
type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
rmd160 2076.87k 9177.90k 15539.88k 18790.40k 20040.36k
After:
Doing rmd160 for 3s on 8 size blocks: 1084941 rmd160's in 3.00s
Doing rmd160 for 3s on 64 size blocks: 617966 rmd160's in 3.00s
Doing rmd160 for 3s on 256 size blocks: 267381 rmd160's in 2.99s
Doing rmd160 for 3s on 1024 size blocks: 82001 rmd160's in 3.00s
Doing rmd160 for 3s on 8192 size blocks: 10974 rmd160's in 3.00s
type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
rmd160 2893.18k 13183.27k 22892.82k 27989.67k 29966.34k
BIGNUM routines
Before:
Doing 512 bit sign dsa's for 10s: 965 512 bit DSA signs in 9.97s
Doing 512 bit verify dsa's for 10s: 766 512 bit DSA verify in 9.93s
Doing 1024 bit sign dsa's for 10s: 276 1024 bit DSA signs in 9.99s
Doing 1024 bit verify dsa's for 10s: 217 1024 bit DSA verify in 9.93s
sign verify sign/s verify/s
dsa 512 bits 0.0103s 0.0130s 96.8 77.1
dsa 1024 bits 0.0362s 0.0458s 27.6 21.9
After:
Doing 512 bit sign dsa's for 10s: 3742 512 bit DSA signs in 9.88s
Doing 512 bit verify dsa's for 10s: 3065 512 bit DSA verify in 9.92s
Doing 1024 bit sign dsa's for 10s: 1357 1024 bit DSA signs in 9.99s
Doing 1024 bit verify dsa's for 10s: 1094 1024 bit DSA verify in 9.83s
sign verify sign/s verify/s
dsa 512 bits 0.0026s 0.0032s 378.7 309.0
dsa 1024 bits 0.0074s 0.0090s 135.8 111.3