C* PROGRAMMING GUIDE
May 1993
Copyright (c) 1990-1993 Thinking Machines Corporation.


CHAPTER 13: COMMUNICATION WITH COMPUTATION
******************************************


This chapter discusses C* library functions that let you perform
computations on parallel values that are being transmitted. Most of
these functions use grid communication. The functions differ in these
ways:

  o  The kinds of computation that are available for each function.
     See Section13.1.

  o  The way in which parallel variable elements are selected. For
     example, some functions let you divide the parallel variable
     elements into groups called scan classes. You can then operate on
     each scan class independently. See Section 13.2.

  o  The way in which the function reports the results of the
     computation. For example, scan provides a running total of its
     computations; spread provides only the final result.

Include the file <cscomm.h> when calling any of the functions
discussed in this chapter.


13.1  WHAT KINDS OF COMPUTATION?
--------------------------------

The scan, reduce, spread, multispread, and global functions let you
specify a combiner type that indicates the kind of computation or
combining you want carried out on the parallel data. Each of these
functions is overloaded for some subset of the combiner types listed
in Table 4.

Table 4. Combiner types.
----------------------------------------------------------------------
Combiner                        Meaning
----------------------------------------------------------------------
CMC_combiner_max        Take the largest value among the specified
                        parallel variable elements.
CMC_combiner_min        Take the smallest value among the specified
                        elements.
CMC_combiner_add        Add the values of the specified elements.
CMC_combiner_copy       Copy the values of the specified elements.
CMC_combiner_multiply   Multiply the values of the specified elements.
CMC_combiner_logior     Perform a bitwise logical inclusive OR on
                        the specified elements.
CMC_combiner_logxor     Perform a bitwise logical exclusive OR on
                        the specified elements.
CMC_combiner_logand     Perform a bitwise logical AND on the
                        specified elements.
----------------------------------------------------------------------

These combiner types are also used by the send function, which is
described in the next chapter.


13.2  CHOOSING ELEMENTS
-----------------------

Several of the C* functions discussed in this chapter provide methods
for choosing the subsets of parallel variable elements on which they
are to operate. The terminology we use in referring to these subsets
of elements comes from scan, which is the most general of the
functions that use these methods.


13.2.1  The Scan Class
----------------------

Two positions belong to the same scan class if their coordinates
differ only along a specified axis. These functions use the concept of
a scan class: scan, reduce, copy_reduce, spread, copy_spread,
enumerate, rank, and multispread.

To see how scan classes work, consider the 2-dimensional shape shown
in Figure 63.

Note for users of CM-200 C*: This and other shapes in this chapter are
smaller than legal size in the CM-200 implementation of C*, so that
they are easy to visualize.

                          [ Figure Omitted ]
Figure 63. A 4-by-4 shape.

If you specify axis 0 as an argument to one of the functions listed
above, you get the scan classes shown in Figure 64. Positions [0][0],
[1][0], [2][0], and [3][0] differ only in their coordinates for axis
0; therefore, they belong to the same scan class. Position [0][1] does
not belong to this scan class, because it has a different axis 1
coordinate; it belongs to a scan class with positions [1][1], [2][1],
and [3][1].

Thus, specifying axis 0 for this shape creates four separate scan
classes, each of which is a column of positions through axis 0 in the
shape. Functions like scan operate on each of these scan classes
independently.

                          [ Figure Omitted ]
Figure 64. Scan classes for axis 0 of a 2-dimensional shape.

Specifying axis 1, on the other hand, creates four different scan
classes, each one consisting of a row of positions through axis 1 in
the shape, as shown in Figure 65.

                          [ Figure Omitted ]
Figure 65. Scan classes for axis 1 of a 2-dimensional shape.

If you have a 1-dimensional shape, there is, of course, only one axis
you can specify, and only one scan class for the shape. You can,
however, subdivide a scan class, as we discuss below.

If you have a 3-dimensional shape, specifying an axis gives you a set
of scan classes consisting of the rows of positions that cross this
axis. For example, in a 2-by-2-by-2 shape, specifying axis 0 creates
these four scan classes:

  [0][0][0] and [1][0][0]

  [0][1][0] and [1][1][0]

  [0][0][1] and [1][0][1]

  [0[[1][1] and [1][1][1]

To operate on more than one dimension in a multi-dimensional shape
(for example, on planes of positions instead of rows of positions),
you must use the multispread or copy_multispread function; these
functions are discussed in Section 13.8.


The Scan Subclass
-----------------

Only active positions participate in computations within a scan class.
The active positions within a scan class are referred to as the scan
subclass.


13.2.2  The Scan Set
--------------------

There may be times when you want a function to operate independently
on different parts of a scan subclass. The scan, enumerate, and rank
functions let you do this by subdividing a scan subclass into scan
sets.

To create scan sets, declare a bool-size parallel variable of the
shape on which the function is to operate, and initialize it to 0.
This parallel variable is referred to as the sbit; it is used as the
sbit argument to the functions listed above. Assign a 1 to an element
of this parallel variable to mark the beginning of a scan set at that
element's position. In the simplest case, the scan set for each
position starts either at the beginning of the scan subclass, or at
the nearest position below it in the scan subclass that has its sbit
set to 1.

Figure 66 shows a 1-dimensional shape divided into scan sets. In the
figure, the scan set for position 1, for example, consists of
positions 0 and 1 (the scan subclass starts at position 0, so the scan
set starts there also, even if the sbit for that position isn't set to
1). The scan set for position 7 consists of positions 5, 6, and 7,
since [5]sbit is set to 1, thus starting a new scan set.

                          [ Figure Omitted ]
Figure 66. Scan sets in a 1-dimensional shape.

Note than scan sets include only active positions; see Section 13.2.3,
however, for a more in-depth discussion of inactive positions and scan
sets.

To show how scan sets work, let's use an example in which we keep a
running total of the values in the parallel variable data (this is a
scan operation, as discussed in Section 13.3). The results are shown
in Figure 67.

                          [ Figure Omitted ]
Figure 67. An operation that provides a running total, using scan
sets.

In the example, [1]running_total contains the sum of [0]data and
[1]data, since 0 and 1 are the positions in its scan set.
[3]running_total contains only the value in [3]data, since [3]sbit is
set to 1, thus starting a new scan set in this position.

You actually have more flexibility than this in how you can divide up
scan subclasses:

  o  Whether an operation is inclusive or exclusive affects the way
     scan sets are interpreted; see "Inclusive and Exclusive
     Operations," below. The example in Figure 67 shows an inclusive
     operation.

  o  There are two ways of interpreting the sbit; see Section 13.2.3.
     In particular, this affects the way scan classes are divided when
     there are inactive positions, and when an operation proceeds in a
     downward direction. The example in Figure 67 shows an operation
     that proceeds in an upward direction.


Inclusive and Exclusive Operations
----------------------------------

The way in which scan sets work when you are performing a particular
operation depends on whether the operation is inclusive or exclusive.
(NOTE: In this section, we are ignoring the effect of segment bits and
start bits; these are discussed in the next section.)

In an inclusive operation (specified by CMC_inclusive), an element
participates in the operation for its positionin other words, the scan
set for a position contains that position. As we mentioned, Figure 67
shows the results of an inclusive operation.

In an exclusive operation (specified by CMC_exclusive), the scan set
for an element does not contain the element itself--in other words, it
does not participate in the operation for its position. Figure 68
shows the results of an exclusive operation, using the same data as
that shown in Figure 67.

                          [ Figure Omitted ]
Figure 68. An exclusive operation on scan sets.

Note the difference between the two results. In the inclusive
operation, for example, [2]running_total receives the running total
for [0]data, [1]data, and [2]data; in the exclusive operation,
[2]running_total receives the running total only for [0]data and
[1]data. When there are no preceding elements in the scan set (for
example, in [3]running_total), the element receives the identity for
the operation.


13.2.3  Segment Bits and Start Bits
-----------------------------------

There are two different kinds of sbits: segment bits and start bits.
Use the smode argument to the scan, enumerate, or rank function to
specify which kind of sbit you want, as discussed below.


If smode Is CMC_segment_bit
---------------------------

If the value of the smode argument is CMC_segment_bit, the sbit is
considered a segment bit, and it divides a scan subclass into
segments, as follows:

  o  An sbit element set to 1 starts a new segment, whether or not the
     element appears in an active position.

  o  The way in which the segment bit divides the scan subclass is not
     affected by the direction of the operation.

  o  Operations in one segment never affect values of elements in
     another segment.


If smode Is CMC_start_bit
-------------------------

If the value of the smode argument is CMC_start_bit, the sbit is
considered a start bit, and scan classes are divided as follows:

  o  An sbit element set to 1 divides a scan subclass only if its
     position is active.

  o  The division is affected by the direction of the operation. When
     the direction is downward, for example, the division occurs from
     the higher coordinate to the lower coordinate.

  o  When an operation is exclusive, the position whose sbit element
     is set to 1 will receive a value from the preceding scan set.

These differences between segment bits and start bits are discussed
below.


Inactive Positions
------------------

When the sbit is a segment bit, a new scan set is created, even though
the position where it starts is inactive. Figure 69 shows an example
(the scan sets displayed are for positions [2], [4], and [7]).

                          [ Figure Omitted ]
Figure 69. An inclusive operation in an upward direction
on segment-bit scan sets, with an inactive position.

Note that position [3] does not participate in the operation, even
though it starts a new scan set.

A start bit does not start a scan set if its position is inactive.
Figure 70 is an example. Note that the scan set for position [4]
begins at position [0], not at position [3], as in Figure 69.

                          [ Figure Omitted ]
Figure 70. An inclusive operation in an upward direction
on start-bit scan sets, with an inactive position.


The Direction of the Operation
------------------------------

When the direction of the operation is upward,  it proceeds from
lower-numbered positions to higher-numbered positions along the scan
subclass. Both kinds of sbits divide the scan subclass in the same way
when the direction is upward (provided that all positions are active);
see Figure 66 for an example. You specify an upward direction with the
argument CMC_upward.

When the direction of the operation is downward (specified by the
argument CMC_downward), the operation proceeds from higher-numbered
positions to lower-numbered positions along the scan subclass. In this
case, segment bits divide the scan subclass in the same way as the
sbits shown in Figure 66; however, since the operation proceeds in a
downward direction, this means that a segment bit ends a scan set, and
the operation begins again in the position with the next lowest
coordinate. Figure 71 is an example; it shows the scan sets for
positions [0], [3], and [5].

                          [ Figure Omitted ]
Figure 71. An inclusive operation in a downward direction
on segment-bit scan sets.

Start-bit scan sets, however, follow the downward direction; in other
words, start bits start scan sets, rather than ending them. Figure 72
is an example; it shows the scan sets for positions [0], [4], and [6].

                          [ Figure Omitted ]
Figure 72. An inclusive operation in a downward direction
on start-bit scan sets.


Data from Another Scan Set
--------------------------

In exclusive operations on start-bit scan sets, the first position in
a scan set receives the result of the operation for the preceding scan
set, if there is one. Figure 73 is an example.

                          [ Figure Omitted ]
Figure 73. An exclusive operation in an upward direction
with start bits.

Compare these results with those shown in Figure 68, which assumes
that the sbit is a segment bit. [3]running_total and [5]running_total
receive the results from the preceding scan set, rather than 0.
[0]running_total still receives 0 (the identity for the operation)
because there is no preceding scan set.

What constitutes a "preceding" scan set depends on the direction of
the operation, of course. In a downward direction, scan sets with
higher-numbered coordinates along the axis precede scan sets with
lower-numbered coordinates.


13.3  THE SCAN FUNCTION
-----------------------

Use the scan function to provide running results for operations on the
scan sets you specify.

The definition of scan is:

     type:current scan (
         type:current source,
         intaxis,
         CMC_combiner_tcombiner,
         CMC_communication_direction_tdirection,
        CMC_segment_mode_tsmode,
         bool:current*sbitp,
         CMC_scan_inclusion_tinclusion);

where:

  source        is the parallel variable whose values are to be used
                in the operation. It must be of the current shape, and
                it can have any arithmetic type.

  axis          specifies the axis along which the scan class or
                classes are to be created; see Section 13.2.

  combiner      specifies the type of operation that scan is to carry
                out. Possible values are listed in Section 13.1.

  direction     specifies the direction of the operation. Possible
                values are CMC_upward and CMC_downward.

  smode         specifies whether the sbit is a segment bit or a start
                bit; see Section 13.2.3. Possible values are
                CMC_start_bit, CMC_segment_bit, and CMC_none. Specify
                CMC_none if there is no sbit.

  sbitp         is a scalar pointer to a bool-size parallel variable
                of the current shape. This parallel variable is the
                sbit, which creates scan sets for the operation.
                Specify CMC_no_field if there is no sbit.

  inclusion     specifies whether the operation is exclusive or
                inclusive; see "Inclusive and Exclusive Operations,"
                above. Possible values are CMC_exclusive and
                CMC_inclusive.

The function returns the result of the scan in a parallel variable of
the current shape and with the same type as source.

The types CMC_combiner_t, CMC_communication_direction_t,
CMC_segment_mode_t, and CMC_scan_inclusion_t are defined by the
compiler.

The scan function provides a running result of the operation you
specify on the parallel variable you specify. If you assign this
result to a parallel variable of the current shape, each element of
the parallel variable receives the running result for its position.
The operation is carried out independently for each scan set.


13.3.1  Examples
----------------

The example below adds the values of data in an upward direction and
assigns the running result to running_total; there is no sbit, and the
operation is inclusive. The results are shown in Figure 74.

     running_total = scan(data, 0, CMC_combiner_add,
         CMC_upward, CMC_none, CMC_no_field, CMC_inclusive);

                            [ Figure Omitted ]
     Figure 74. An example of the scan function with no sbit.


The next example assigns the minimum value of data in the scan set to
running_min. The direction is downward, the operation is inclusive,
and the sbit is a start bit. The results are shown in Figure 75.

     running_min = scan(data, 0, CMC_combiner_min,
         CMC_downward, CMC_start_bit, &start_bit,
         CMC_inclusive);

                            [ Figure Omitted ]
     Figure 75. An example of the scan function with a start bit and
     a downward direction.


Note that you would get a different result in this example if the sbit
were a segment bit, since segment bits and start bits behave
differently when the direction is downward.

The example below multiplies the values of data in the scan set and
assigns the product to running_product. The direction is upward, the
operation is exclusive, and the sbit is a segment bit. The results are
shown in Figure 76.

     running_product = scan(data, 0, CMC_combiner_multiply,
         CMC_upward, CMC_segment_bit,
         &segment_bit, CMC_exclusive);

                            [ Figure Omitted ]
     Figure 76. An example of the scan function using a segment bit
     and an exclusive operation.


These examples are of a 1-dimensional shape, which by definition has
only one scan class. If a shape has more than one dimension, more than
one scan class is created, and scan carries out the operation on all
scan subclasses (or scan sets, if the sbit is used) at the same time.

The destination parallel variable can be the same as the source
parallel variable. In other words, a statement like this is legal:

     data = scan(data, 0, CMC_combiner_add, CMC_upward,
         CMC_none, CMC_no_field, CMC_inclusive);


In this case, the elements of data are overwritten with the results of
the operation.


13.4  THE REDUCE AND COPY_REDUCE FUNCTIONS
------------------------------------------


13.4.1  The reduce Function
---------------------------

Use the reduce function to put the result of an operation into a
single parallel variable element in each scan subclass.

The reduce function has this definition:

     void reduce (
         type:current *destp
         type:current source,
         int axis,
         CMC_combiner_t combiner,
         int to_coord);

where:

  destp         is a scalar pointer to a parallel variable, of the
                current shape and of any arithmetic type. One element
                of each scan subclass of this parallel variable
                receives the result of the operation.

  source        is a parallel variable (of the current shape) whose
                values are to be used in the operation. It must be of
                the same type as the parallel variable pointed to by
                destp.

  axis          specifies the axis along which the scan class or
                classes are to be created; see Section 13.2.

  combiner      specifies the type of operation that reduce is to
                carry out. Possible values are CMC_combiner_max,
                CMC_combiner_min, CMC_combiner_add,
                CMC_combiner_logior, CMC_combiner_logxor, and
                CMC_combiner_logand.

  to_coord      specifies the coordinate of the parallel variable
                pointed to by destp that is to receive the result of
                the operation.

Note these differences between reduce and scan:

  o  reduce puts the final result of the operation into a single
     parallel variable element of the scan subclass; it does not
     produce a running result.

  o  reduce does not use scan sets; therefore, it does not have the
     arguments smode and sbit.

  o  Copying with reduction is handled as a separate function, which
     is discussed below.

Elements of source that are at inactive positions do not participate
in the operation. If a position specified by to_coord is inactive,
that element of dest does not receive the result.

dest can be the same parallel variable as source; the result simply
overwrites the value(s) in the specified element(s).


An Example
----------

The statement below puts the maximum value of data into element 0 of
max. The results are shown in Figure 77.

     reduce(&max, data, 0, CMC_combiner_max, 0);

                            [ Figure Omitted ]
     Figure 77. An example of the reduce function.


Incidentally, this statement is virtually equivalent to this C*
statement:

     [0]max = >?= data;

But note these points:

  o  If position [0] were inactive, the assignment statement above
     would work; if you used reduce, the reduction would not take
     place.

  o  The equivalence holds only for 1-dimensional shapes. In shapes
     with more dimensions, reduce carries out its operation separately
     for each scan subclass, whereas the reduction assignment carries
     out its operation once for all elements of the parallel variable.


13.4.2  The copy_reduce Function
--------------------------------

Use the copy_reduce function to copy a value from one parallel
variable element of a scan subclass to another parallel variable
element.

The definition of copy_reduce is:

     void copy_reduce (
         type:current *destp
         type:current source,
         int axis,
         int to_coord,
         int from_coord);


The arguments are the same as for the reduce function, except that
there is a from_coord argument instead of a combiner. from_coord
specifies the element of source from which the value is to be copied.
It is copied into the to_coord element of the parallel variable
pointed to by destp for each scan subclass. If either from_coord or
to_coord specifies an inactive position, the copying does not take
place for that scan subclass.


An Example
----------

This example copies the values of elements in row 1 of data into
elements of row 0 of copy:

     copy_reduce(&copy, data, 0, 0, 1);


The results for some sample values are shown in Figure 78.

                          [ Figure Omitted ]
Figure 78. An example of the copy_reduce function.


If the example of copy_reduce shown in Figure 78 were applied to a 1-
dimensional shape, it would be equivalent to:

     [0]copy = [1]data;


If position [0] were inactive, however, the results would be
different. [0]copy would get the result from [1]data if you used the
assignment statement above; it would not get the value if you used
copy_reduce.


13.5  THE SPREAD AND COPY_SPREAD FUNCTIONS
------------------------------------------


13.5.1  The spread Function
---------------------------

Use the spread function to place the result of an operation into all
the elements of a specified parallel variable in a scan subclass.

The spread function has this definition:

     type:current spread (
         type:current source,
         int axis,
         CMC_combiner_t combiner);

where:

  source        is a parallel variable (of the current shape) whose
                values are to be used in the operation. It can have
                any arithmetic type.

  axis          specifies the axis along which the scan class or
                classes are to be created; see Section 13.2.

  combiner      specifies the type of operation that spread is to
                carry out. Possible values are CMC_combiner_max,
                CMC_combiner_min, CMC_combiner_add,
                CMC_combiner_logior, CMC_combiner_logxor, and
                CMC_combiner_logand. See Section 13.1.

spread returns its result in a parallel variable of the current shape;
the parallel variable has the same type as source. This destination
parallel variable can be the same as the source parallel variable, in
which case the elements of the source parallel variable are
overwritten with the result.

The spread function "spreads" the result of an operation into all
active elements of the destination parallel variable in a scan
subclass. Like reduce, spread does not use scan sets, and it does not
have a CMC_combiner_copy operation; copying is handled by the
copy_spread function, as discussed below.

Inactive positions do not participate in the operation.


An Example
----------

The code below adds the values of the elements in data in the scan
subclasses of axis 1, and assigns the result to total. The results for
sample data are shown in Figure 79.

     total = spread (data, 1, CMC_combiner_add);

                            [ Figure Omitted ]
     Figure 106. An example of the spread function.


13.5.2  The copy_spread Function
--------------------------------

Use the copy_spread function to copy a value from an element of a
parallel variable in a scan subclass to all elements of a parallel
variable in the scan subclass.

The copy_spread function has this definition:

     type:current copy_spread (
         type:current *sourcep,
         int axis,
         int coordinate);

where:

  sourcep       is a scalar pointer to a parallel variable, one value
                of which is to be copied.

  axis          specifies the axis along which the scan class or
                classes are to be created.

  coordinate    is the coordinate along axis that specifies the source
                parallel variable element whose value is to be copied.

The function returns a parallel variable of the current shape and the
same arithmetic type as the parallel variable pointed to by sourcep,
containing the results of the operation.

If a specified element of the source parallel variable is inactive,
its value is copied. However, inactive positions of the destination
parallel variable do not receive a result.


An Example
----------

The code below copies the value from element [n][1] of data to
elements of copy in the same scan subclass along axis 1. The results
are shown in Figure 80.

     copy = copy_spread(&data, 1, 1);

                            [ Figure Omitted ]
     Figure 80. An example of the copy_spread function.


Note that, for a 1-dimensional shape, the above statement is
equivalent to this statement:

     copy = [1]data;


unless position [1] is inactive. In that case, the assignment
statement works; copy_spread, however, would not copy [1]data.


13.6  THE ENUMERATE FUNCTION
----------------------------

Use the enumerate function to place in each active element of a
parallel variable the size of its scan set. As we discuss in more
detail below, enumerate is a generalized version of the pcoord
function.

The enumerate function has this definition:

     unsigned int:current enumerate (
         int axis,
         CMC_communication_direction_t direction,
         CMC_scan_inclusion_t inclusion,
         CMC_segment_mode_t smode,
         bool:current *sbitp);


All the parameters for enumerate have the same meanings and take the
same values as the corresponding parameters for the scan function; see
Section 13.3. Like scan, enumerate lets you specify a direction, an
sbit, and whether the operation is to be exclusive or inclusive. Note,
however, that the return value is an unsigned int of the current
shape.

If you specify CMC_inclusive, enumerate includes each position in
calculating the size of the scan set for that position. If you specify
CMC_exclusive, enumerate does not include the position in calculating
the size of its scan set.

An inactive position does not receive a value and is not included in
the calculation of values for other positions; see the third example,
below.


13.6.1  Examples
----------------

The first example does an exclusive enumerate in an upward direction,
ignoring the sbit, and assigning the result to number. The results are
shown in Figure 81.

     number = enumerate(0, CMC_upward,
         CMC_exclusive, CMC_none, CMC_no_field);

                            [ Figure Omitted ]
     Figure 81. An example of the enumerate function without an sbit.


This is exactly equivalent to this use of pcoord when all positions
are active:

     number = pcoord(0);


Both functions initialize each parallel variable element to its
coordinate along the axis. The enumerate function, however, is more
versatile than pcoord. In the next example, enumerate uses the sbit as
a start bit and proceeds in a downward direction, using the inclusive
mode:

     number = enumerate(0, CMC_downward, CMC_inclusive,
         CMC_start_bit, &start_bit);


The results are shown in Figure 82.

                          [ Figure Omitted ]
Figure 82. An example of the enumerate function
with a start bit and a downward direction.


In the example below, the sbit is a segment bit, the enumerate is
exclusive, the direction is upward, and position 2 is inactive. The
results are shown in Figure 83.

     where (p1 != 9)
         number = enumerate(0, CMC_upward, CMC_exclusive,
         CMC_segment_bit, &segment_bit);

                            [ Figure Omitted ]
     Figure 110. An example of the enumerate function using a segment bit
     and an exclusive operation, with an inactive position.


Note that the inactive position is not included in the enumeration.


13.7  THE RANK FUNCTION
-----------------------

Use the rank function to produce a numerical ranking of the values of
parallel variable elements in a scan set.

The definition of rank is:

     unsigned int:current rank (
         type:current source,
         int axis,
         CMC_communication_direction_t direction,
         CMC_segment_mode_t smode,
         bool:current *sbitp);


The parameters for rank have the same meanings and take the same
values as the corresponding parameters for the scan function; see
Section 13.3. Like scan and enumerate, rank lets you specify a
direction and an sbit. It does not, however, let you specify that its
operation is exclusive; the operation can only be inclusive. Also,
note the behavior of rank with scan sets discussed below. Like the
enumerate function, rank returns an unsigned int of the current shape.

The rank function returns, for each active position, the rank of the
value of the specified parallel variable at that position in its scan
set. Inactive positions are not included in the determination of the
rank for other positions, and they do not receive a rank themselves.
The ranking is from 0 to n-1, where n is the total number of positions
in the scan set. The ranks are assigned as follows:

  o  When the direction is upward, the lowest value is assigned rank
     0.

  o  When the direction is downward, the highest value is assigned
     rank 0.

  o  If more than one element has the same value, their ranks are
     assigned arbitrarily within the range of ranks they represent.

  o  An sbit restarts the ranking of values within the scan set;
     however, it does not restart the values assigned to the ranks.
     This behavior is different from that of other functions. For
     example, if a scan set extends from position [4] through position
     [15], the ranks assigned within this scan set are 4 through 15,
     not 0 through 11.


13.7.1  Examples
----------------

The first example has no sbit and ranks the values of data in a upward
direction; it assigns the ranks to data_rank. The results are shown in
Figure 84.

     data_rank = rank(data, 0,
         CMC_upward, CMC_none, CMC_no_field);

                            [ Figure Omitted ]
     Figure 84. An example of the rank function with no sbit.


In the next example, the sbit is a segment bit, the direction is
downward, and position 1 is inactive. The results are shown in Figure
85.

     where (data != 7)
         data_rank = rank(data, 0, CMC_downward,
                     CMC_segment_bit, &segment_bit);

                            [ Figure Omitted ]
     Figure 85. An example of the rank function using a segment bit
     and a downward direction, with an inactive position.


The final example uses rank along with parallel left indexing to
actually reorder parallel variable elements according to their rank:

     [rank(data, 0, CMC_upward, CMC_none, CMC_no_field)]sorted = data;


In this example, data sends values to sorted, using the return values
from rank as an index. The key here is to have rank operate on the
parallel variable that is doing the sending. The results are shown in
Figure 86.

                          [ Figure Omitted ]
Figure 86. Using rank as a parallel left index to reorder parallel
variable elements according to their ranks.


Note how values move in the example: [0]data, for example, has a rank
of 1; therefore, its value (4) is sent to [1]sorted.

You can also achieve the same result using the make_send_address and
send functions along with rank; see Section 14.3.3.


13.8  THE MULTISPREAD FUNCTION
------------------------------

The multispread function is like the spread function, except that you
can use it to spread the result of an operation along more than one
axis at the same time. This is useful in shapes that have more than
two dimensions. For example, in a 3-dimensional shape, you can use
spread to spread results along any one of the dimensions; multispread
lets you spread results through entire planes of positions instead of
along a single dimension.

To see how this works, consider the simple 8-position 2-by-2-by-2
shape shown in Figure 87.

                          [ Figure Omitted ]
Figure 87. A 3-dimensional shape.

As we mentioned in Section 13.2.1, specifying axis 0 creates four scan
classes for this shape:

  [0][0][0] and [1][0][0]

  [0][1][0] and [1][1][0]

  [0][0][1] and [1][0][1]

  [0][1][1] and [1][1][1]

In each scan class, the positions differ only along axis 0. These scan
classes are shown in Figure 88.

                          [ Figure Omitted ]
Figure 88. Scan classes in a 3-dimensional shape.

For the multispread function, you can specify more than one axis along
which the positions can differ. In this case, let the positions differ
along axes 0 and 1; axis 2 is fixed. This results in two sets of
positions:

     [0][0][0]
     [1][0][0]
     [0][1][0]
     [1][1][0]

and:

     [0][0][1]
     [1][0][1]
     [0][1][1]
     [1][1][1]

Figure 89 shows these two sets of positions. The sets of positions in
which the positions are allowed to differ along more than one axis are
called hyperplanes. Scan classes are therefore a special case of
hyperplanes, in which the positions can differ along only one axis.
The multispread function operates on any kind of hyperplane.

                          [ Figure Omitted ]
Figure 89. Hyperplanes in a 3-dimensional shape.

The multispread function has this definition:

     type:current multispread (
         type:current source,
         unsigned int axis_mask,
         CMC_combiner_t combiner);


The only difference between this definition and that of spread is the
axis_mask parameter. The axis_mask parameter is a bit mask that
specifies the axes along which the positions in a hyperplane are
allowed to differ. For example, use a bit mask of 3 to specify axes 0
and 1; use 6 to specify axes 1 and2.

The example below assumes a 3-dimensional shape like the one shown
above. In it, the values of source in the hyperplanes described by
axes 0 and 1 are added, and the results are spread to all elements of
dest in the same hyperplane.

     dest = multispread(source, 3, CMC_combiner_add);


13.8.1  The copy_multispread Function
-------------------------------------

There is also a copy_multispread function, comparable to the
copy_spread function, but available for use on hyperplanes instead of
scan classes. Using copy_multispread, however, requires an
understanding of send addresses, which are discussed in the next
chapter. We therefore defer discussion of this function until Section
14.5.


13.9  THE GLOBAL FUNCTION
-------------------------

Use the global function to perform reduction operations on a parallel
variable and assign the result to a scalar variable.

The global function has this definition:

     type global (
         type:current source,
         CMC_combiner_t combiner);

where:

  source        is a parallel variable (of the current shape and any
                arithmetic type) upon whose values the reduction
                operation is to be performed.

  combiner      specifies the reduction operation. Possible values are
                CMC_combiner_max, CMC_combiner_min, CMC_combiner_add,
                CMC_combiner_logior, CMC_combiner_logxor, and
                CMC_combiner_logand; see Section 13.1 for definitions
                of these values.

The function returns a scalar variable of the same type as source.

The global function provides an alternative method for performing
certain reduction operations. For example, these two statements are
equivalent (where s1 is a scalar variable and p1 is a parallel
variable of the same type):

     s1 = |= p1;

and:

     s1 = global(p1, CMC_combiner_logior);

Both do a bitwise inclusive OR of p1 and assign the result to s1.

Note that global does not have a combiner value for the reduction
assignment operator -= (negative of the sum of the parallel values).

The global function operates only on active positions.

-----------------------------------------------------------------

Contents copyright (C) 1990-1993 by Thinking Machines Corporation.
All rights reserved. This file contains documentation produced
by Thinking Machines Corporation. Unauthorized duplication of
this documentation is prohibited.

*****************************************************************

  The information in this document is subject to change without
  notice  and should not be construed as a commitment by Think-
  ing  Machines  Corporation. Thinking  Machines  reserves  the
  right to make changes to any product described herein.

  Although the information  in this document has  been reviewed
  and is believed to be reliable, Thinking Machines Corporation
  assumes no liability for  errors in this  document.  Thinking
  Machines  does  not  assume  any  liability  arising from the
  application  or use of any  information or product  described
  herein.

*****************************************************************

Connection Machine (r)
is a registered trademark of Thinking Machines Corporation.
CM, CM-2, CM-200, and CM-5 are trademarks of Thinking Machines Corporation.
C* (r) is a registered trademark of Thinking Machines Corporation.
Thinking Machines (r)
is a registered trademark of Thinking Machines Corporation.
UNIX is a registered trademark of UNIX System Laboratories, Inc.

Copyright (c) 1990-1993 by Thinking Machines Corporation.  All rights reserved.

Thinking Machines Corporation
245 First Street
Cambridge, Massachusetts 02142-1264
(617) 234-1000