"The scalability of Raytrace improves with XOrder, yet table 4 shows an
increase in aborts at 16 and 32 processors. Please explain."

Note that not all aborts are equal. The Raytrace program contains a global
"counter" variable which is read and updated by each thread. The code is
essentially, "begin_transaction(); my_count = global++; end_transaction();".
The reason for the improved performance is because the XOrder version of
Raytrace is actually seeing more "true" conflicts. XOrder analysis has removed
some of the "false" conflicts for longer running transactions. This allows for
more threads to operate in parallel, which in turn, results in more aborts
(from the read-modify-write behavior on the global counter).

"I see unstructured transactions mainly as a way for the authors to create
test cases but not really a key contribution of the work ... it’s unclear how
this can be used in the future."
"Why are the details of unstructured transactions relevant to the PLDI
community?"

It is true that our analyses needed to handle unstructured transactions in
order to run the test cases. However, short of building an analysis
infrastructure that handles atomic blocks (e.g., modifying the FrontC and
AST's of CIL), program analysis tools that work with transactional programs
must also be able handle this programming idiom. For example, many of the
STM system's available today rely on the programmer to manually specify the
beginning and ending of a transaction via a call to the STM runtime library.

The atomic block construct is a language-based technique to reduce programmer
errors. Underlying systems (either an HTM or STM's runtime library), will have
low-level primitives similiar to begin_transaction() and end_transaction(). To
analyze low-level code, an analysis tool has two options: 1) handle the
program as is (the approach we chose) or 2) attempt to translate the program's
begin_transaction() and end_transaction() statements into atomic blocks.
Option 2 is attractive, but will require an analysis like that which we have
presented. Additionally, a transaction-aware compiler will likely attempt to
"shrink" the code actually executed transactionally. For example, the
optimizing compiler, when translating an atomic block into its lower-level
primitives, might move a begin_transaction() or end_transaction() call across
a function boundary. Similiarly, the compiler might duplicate one (or both) of
these calls inside the branches of an if-statement. An analysis like that
which we have presented is necessary to verify the compiler optimization was
performed correctly (with respect to matched begin/end_transaction
statements).

"In some of the tests the number of conflicts goes up. Do you expect them to
be actual or false conflicts?"

We have addressed this question earlier; however, we note that a current
limitation of our simulator that we are not able to differentiate between true
and false conflicts. This ability would be extremely helpful for future
research.

"Do you have other XOrder experiments that can deliver a more complete picture
of the performance of software transaction (as a result of conflicts and cache
behavior)?"

We are currently working on the analysis of the Berkeley DB library and hope
to have results soon.

"Section 6 mentions that false conflicts are particularly problematic in HTM
systems because of cache-line-granularity conflict detection."

We used an HTM system for our experiements. In general, false conflicts are
problematic on TM systems that use a "block" of memory as its granularity for
conflict detection. While common among HTM implementations, some STM
implementations also use this technique (cf. McRT). An STM system that does
not target memory-safe object oriented languages will likely use such a
technique, which makes XOrder applicable to these systems also.