"The scalability of Raytrace improves with XOrder, yet table 4 shows an increase in aborts at 16 and 32 processors. Please explain." Note that not all aborts are equal. The Raytrace program contains a global "counter" variable which is read and updated by each thread. The code is essentially, "begin_transaction(); my_count = global++; end_transaction();". The reason for the improved performance is because the XOrder version of Raytrace is actually seeing more "true" conflicts. XOrder analysis has removed some of the "false" conflicts for longer running transactions. This allows for more threads to operate in parallel, which in turn, results in more aborts (from the read-modify-write behavior on the global counter). "I see unstructured transactions mainly as a way for the authors to create test cases but not really a key contribution of the work ... it’s unclear how this can be used in the future." "Why are the details of unstructured transactions relevant to the PLDI community?" It is true that our analyses needed to handle unstructured transactions in order to run the test cases. However, short of building an analysis infrastructure that handles atomic blocks (e.g., modifying the FrontC and AST's of CIL), program analysis tools that work with transactional programs must also be able handle this programming idiom. For example, many of the STM system's available today rely on the programmer to manually specify the beginning and ending of a transaction via a call to the STM runtime library. The atomic block construct is a language-based technique to reduce programmer errors. Underlying systems (either an HTM or STM's runtime library), will have low-level primitives similiar to begin_transaction() and end_transaction(). To analyze low-level code, an analysis tool has two options: 1) handle the program as is (the approach we chose) or 2) attempt to translate the program's begin_transaction() and end_transaction() statements into atomic blocks. Option 2 is attractive, but will require an analysis like that which we have presented. Additionally, a transaction-aware compiler will likely attempt to "shrink" the code actually executed transactionally. For example, the optimizing compiler, when translating an atomic block into its lower-level primitives, might move a begin_transaction() or end_transaction() call across a function boundary. Similiarly, the compiler might duplicate one (or both) of these calls inside the branches of an if-statement. An analysis like that which we have presented is necessary to verify the compiler optimization was performed correctly (with respect to matched begin/end_transaction statements). "In some of the tests the number of conflicts goes up. Do you expect them to be actual or false conflicts?" We have addressed this question earlier; however, we note that a current limitation of our simulator that we are not able to differentiate between true and false conflicts. This ability would be extremely helpful for future research. "Do you have other XOrder experiments that can deliver a more complete picture of the performance of software transaction (as a result of conflicts and cache behavior)?" We are currently working on the analysis of the Berkeley DB library and hope to have results soon. "Section 6 mentions that false conflicts are particularly problematic in HTM systems because of cache-line-granularity conflict detection." We used an HTM system for our experiements. In general, false conflicts are problematic on TM systems that use a "block" of memory as its granularity for conflict detection. While common among HTM implementations, some STM implementations also use this technique (cf. McRT). An STM system that does not target memory-safe object oriented languages will likely use such a technique, which makes XOrder applicable to these systems also.