The approaches that DynamoRIO and Pin take to
instrumentation are fundamentally different. Pin's
interface is not so different from tools that modify the
original code, such as ATOM, Dyninst, Vulcan, or Detours.
The only instrumentation option in Pin is to insert a
callout or trampoline to instrumentation code at a
particular point in the application's code stream. The
application code stream itself cannot be modified directly.
Pin does use its code cache to improve over non-code-cache
tools that modify original application code in two respects:
code discovery and transparency. The code cache design
allows for incremental dynamic discovery of code, making it
simple to examine all code that is executed, while
non-code-cache tools must statically (or at load time)
determine which code to instrument based on the application
binary and library files. The code cache also eliminates
safety issues of modifying variable-length IA-32/AMD64 code:
direct modification by inserting a 5-byte jump instruction
can overwrite an entry point midway through those 5 bytes,
as well as suffer from races with multiple threads and
complications if those 5 bytes are later examined or
modified by the application (essentially, the displaced 5
bytes are a "miniature code cache" that incur all the same
complications as a regular code cache).
DynamoRIO leverages its code cache to provide a much broader
interface. DynamoRIO allows modification of the runtime
code stream of the application: modifying, inserting, or
removing individual instructions. This is in stark contrast
to Pin's observe-and-callout model. DynamoRIO's interface
is only possible with a code cache, and it takes full
advantage of the cache's power. Not only does this general
interface let DynamoRIO support non-observational-only uses,
such as optimization or translation, but it also gives full
control over instrumentation to the tool writer. While Pin
does automatically inline and optimize simple callouts, the
client has little control or guarantee over the final
performance, and a minor change in the instrumentation
routine can have an order of magnitude impact in performance
if it prevents inlining.
For a real-world example, consider the PiPA memory profiler
and cache simulator:
Q. Zhao, I. Cutcutache, and W.F. Wong, "PiPA:
Pipelined Profiling and Analysis on Multi-core Systems".
Proceedings of The 2008 International Symposium on Code
Generation and Optimization (CGO 08), pp. 185-194. Boston,
MA, U.S.A. Apr 2008.
Talk slides: http://www.comp.nus.edu.sg/~ioana/files/PiPA.pps
Implemented as a DynamoRIO tool, PiPA improved the
performance of the Pin dcache tool by a factor of 3.27x.
Implemented in Pin the speedup was only 2.6x, due to the
inability to fully optimize the instrumentation.
As requested I have gathered updated numbers for the bbcount
comparison using the new Pin release 22117, which has
improved inlining support. Several benchmarks such as
perlbmk, crafty, and eon improve noticeably. The harmonic
means are now 226% (a 5% improvement from 231%), 233%, and
185%, respectively.

The comment that pin only allows you to observe is not correct. You can delete instructions and insert calls that modify application registers or insert branches. Since you can delete an instruction and replace it with a function call that modifies application registers and memory, you have full control. We think this is easier than directing dynamorio to delete instructions and insert other instructions. It is similar to the difference between writing C and asm code. Pintool authors have used this to write emulators for new instructions, transactional memory, simulating speculation, etc. See http://software.intel.com/en-us/blogs/2008/08/11/emulation-of-new-instructions/ for an example.