GETTING STARTED IN *LISP Version 6.1, June 1991 Copyright (c) 1991 by Thinking Machines Corporation. "Example is always more Chapter 6: *Lisp Sampler--A Scan in Four Fits efficacious than precept." ********************************************* Samuel Johnson In this chapter, we'll look at four short *Lisp examples that perform simple, useful tasks. To give this chapter some continui- ty, I'll focus on one of the most unique and powerful tools of *Lisp: the scan!! operator (which we looked at briefly in Section 3.5). The scan!! operator can be used in a variety of ways, some of which may suprise you if you're new to the techniques of parallel programming. We'll see a number of uses for scan!! in the "fits" of code below, many of which can be adapted for use in your own code. Of course, I'll also be using other *Lisp opera- tions, so from these examples you'll be able to get a sense of how you can combine *Lisp operators to perform specific tasks in your programs. NOTE: Some of the examples in this chapter are rather long, so you may not want to type them in by hand. The file /cm/starlisp/interpreter/f6100/starlisp-sampler.lisp in the *Lisp source code directory contains a copy of all code ex- amples used in this chapter. Ask your systems manager or ap- plications engineer if you need help locating this file. 6.1 Fit the First: Adding Very Large Integers In *Lisp you can define integer pvars of any size, and add them in parallel: > (pref (+!! 12345678901234567890 87654321098765432109) 0) 99999999999999999999 But if you have extremely large integers to add (longer than, say, forty digits), this isn't really an efficient way to do it. There's no rule, however, saying that you have to add numbers by storing them one-per-processor. You can divide a very large in- teger up into its individual bits, and store the bits one-per- processor (see figure below). You can then use *Lisp operations to perform a parallel addition of these pvars, using all the CM processors. Processor n 13 12 11 10 9 8 7 6 5 4 3 2 1 0 _ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ |0| ... | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | |_| |___|___|___|___|___|___|___|___|___|___|___|___|___|___| Very large integer stored as a pvar with a single bit value for each processor. Let's assume that we have two large integers, stored one bit per processor, as described above. This is the same as saying that we have two integer pvars (or, to be more precise, unsigned-byte pvars) of length 1, each containing the bits of a very large binary number, one bit per processor. We'll assume that the bits of these pvars are stored in send-address order (that is, with the lowest order bit stored in the processor with a send address of 0). To add these pvars, we use the rules of binary addition: o If both source pvars contain 0 for a processor, the result will be 0 for that processor. o If one source pvar contains 1, and the other 0, the result will be 1. o If both source pvars contain 1, the result is 0 with a carry of 1. o A carry value is propagated forward, negating the result for each subsequent processor, until it reaches a processor with 0 in both sources (see figure below). Processor n 13 12 11 10 9 8 7 6 5 4 3 2 1 0 _ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ Pvar 1 |0| ... | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | |_| |___|___|___|___|___|___|___|___|___|___|___|___|___|___| _ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ Pvar 2 |0| ... | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | |_| |___|___|___|___|___|___|___|___|___|___|___|___|___|___| <-----carry-----1 <-----carry-----1 _ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ Result |0| ... | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | |_| |___|___|___|___|___|___|___|___|___|___|___|___|___|___| Addition example, showing the propagation of carry bits. The hardest part of the addition is keeping track of the carry values, and incorporating them into the final result. However, we can easily keep track of them by using the segmented scanning feature of scan!! that we saw in Section 3.5.1. In essence, processors where both sources are 1 and processors where both sources are 0 define "segments" within which carry values are propagated. The carry values start at pairs of 1's and end at pairs of 0's, affecting all values in between. We can use parallel boolean operations to determine where these segments be- gin and end, and then use a segmented scan!! to propagate the ef- fects of the carry bits. Here's a function that does this: > (defun very-long-add!! (bit-pvar1 bit-pvar2) (declare (type (field-pvar 1) bit-pvar1 bit-pvar2)) (*let ((zero-zero (=!! 0 bit-pvar1 bit-pvar2)) (one-one (=!! 1 bit-pvar1 bit-pvar2)) carry-segments will-receive-carry dest) (declare (type boolean-pvar zero-zero one-one) (type boolean-pvar carry-segments will-receive-carry) (type (field-pvar 1) dest)) ; Determine points at which carries start and end (*set carry-segments (or!! (zerop!! (self-address!!)) zero-zero one-one)) ; Determine processors that will be affected by a carry (*set will-receive-carry (scan!! one-one 'copy!! :segment-pvar carry-segments :include-self nil)) ; Exclude processor zero, because it can't get a carry (*setf (pref will-receive-carry 0) nil) ; Perform the addition (*set dest (if!! (or!! one-one zero-zero) ; Pairs of 1's and 0's produce 1 with carry, else 0 (if!! will-receive-carry 1 0) ; All other values will be 0 with carry, 1 otherwise (if!! will-receive-carry 0 1))) dest)) The call to scan!! is the real heart of this algorithm. The rest of the function is simply setup code to determine the exact boun- daries for the scan, and cleanup code that uses the scan result to calculate the final value for each processor. It's quite com- mon for a function that uses scan!! to have this form: setup code, the scan!! itself, then cleanup code. We'll see other exam- ples of this later on. FOR THE OBSERVANT: You'll notice that the call to scan!! has an :include-self argument of nil. The effect of this argument is to prevent each processor involved in the scan from including its own value in the result that is passed on to the next processor in the scan. Since we're doing a copy!! scan in the very- long-add!! function, this has two effects: o Processors that produce a carry bit (1/1 processors) aren't affected by it. o Processors that halt propagation of a carry bit (0/0 processors) are affected by it. The :include-self argument to scan!! is a handy tool for controlling the effects of a scan, because it effectively "shifts" the effect of the scan ahead by one processor. We'll see other examples of the use of :include-self later on in this chapter. Now, since the bit pvars we'll be using have the least signifi- cant bit stored in processor 0, we'll want a function that displays these pvars with this bit on the right, rather than on the left, as ppp would display them. Here's a definition for a function that does this, called pbp (short for print-bit-pvar): > ( defun pbp (pvar &key (length 20) title) "Print Bit Pvar" (*let ((display-pvar (if!! ( (*defvar bits (if!! ( (ppp bits :end 20) ; low-order bits on left 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 > (pbp bits :length 20) ; low-order bits on right 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 Now, to test very-long-add!!, we'll write a function that creates two bit pvars, displays them, and adds them. > (defun test-very-long-add () (let ((length1 (+ 12 (random 5))) (length2 (+ 12 (random 5)))) (*let ((bit-pvar1 0) (bit-pvar2 0)) (declare (type (field-pvar 1) bit-pvar1 bit-pvar2)) ; Store random binary numbers in the bit pvars (*when ( (test-very-long-add) Bit-pvar 1: 0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 1 0 1 0 1 Bit-pvar 2: 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 1 0 1 0 Result : 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 1 1 1 1 1 > (test-very-long-add) Bit-pvar 1: 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 1 0 1 Bit-pvar 2: 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 Result : 0 0 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 1 1 Obviously, we're only adding numbers of 12 to 17 bits in these examples, but very-long-add!! can be used without modification to add numbers with thousands or even millions of bits, limited only by the number of processors that you are currently using. FOR THE CURIOUS: If you'd like to get a closer look at the inner workings of very-long-add!!, you can use the *Lisp function ppp!! to display the value of any pvar expression without affecting the value that expression returns. Unlike ppp, which prints a pvar and then returns nil, the ppp!! function prints a pvar and then returns the same pvar as its value. For more information, see the entry for ppp!! in the *Lisp Diction- ary. 6.2 Fit the Second: A Segmented news!! Function In Chapter 3 we looked at router and grid communication, and I pointed out that grid communication is faster because of its re- gularity: all values move in the same direction across the grid. The scan!! function is also preferable to using the router, for the same reason: it's faster and more efficient. But scanning is much more flexible than grid communication. You can use scan!! to write data-movement operations that have the speed of grid com- munication operations, yet at the same time allow you to rear- range the values of your pvars in router-like ways. For example, the grid communication operator news!! doesn't have a :segment-pvar argument like scan!!, but you can use segmented scan!! calls to perform news!!-like shifts of data in a segmented manner. As a specific example, let's take the pvar returned by (self- address!!) as our "data" pvar: > (ppp (self-address!!) :end 20 :format "~2D ") 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 and suppose that we have a segment pvar with the following values (for clarity, I've used the :format argument of ppp to replace nil values with dots): > (ppp segment-pvar :end 20 :format " ~:[.~;T~] ") T T . . T . . . T . . . T . . . T . . . We can use a segmented scan!! to "rotate" each segment of the self-address!! pvar one position to the right, producing: 0 3 1 2 7 4 5 6 11 8 9 10 15 12 13 14 19 16 17 18 Here's how to do it: > (defun segmented-news!! (pvar segment-pvar) (*let (end-segment-pvar result temp) ; Define a second segment pvar that has T's at ; the _end_ of the segments defined by segment-pvar (*set end-segment-pvar (scan!! segment-pvar 'copy!! :direction :backward :segment-pvar t!! :include-self nil)) ; Last active processor in end-segment-pvar must be T (*setf (pref end-segment-pvar (*max (self-address!!))) T) ; use scan!! to shift pvar values forward one position (*set result (scan!! pvar 'copy!! :segment-pvar t!! :include-self nil)) ; use a backward scan to copy the last value ; of each segment back to the start of the segment (*set temp (scan!! pvar 'copy!! :segment-pvar end-segment-pvar :include-self t :direction :backward)) ; combine the copied last elements from temp pvar ; with the elements in the result pvar (*when segment-pvar (*set result temp)) ; return the resulting shifted pvar result)) There are three scan!! calls in this function, each of which per- forms a particular task. In order, the scans are: o A "backward" copy!! scan with a segment pvar of t!! and :include-self nil. This shifts the contents of segment-pvar "backwards" by one element, producing a segment pvar that contains t for every processor that ends a segment. (Note that we also have to use *setf to make sure this pvar has a t in the "last" active processor.) o A "forward" copy!! scan with a segment pvar of t!! and :include-self nil. This shifts each value of the supplied pvar argument "forward" by one step. o A "backward" copy!! scan using the end-of-segment pvar we created, to copy the last value of each segment back to the first location in the segment. Notice that this time :include-self is t, to keep copied values from crossing seg- ment boundaries. The result, after we use *when and *set to combine things, is a copy of pvar with its contents rotated forward by one element within each segment, as defined by segment-pvar. Here's an example of this function in action, using the example shown at the beginning of this section: > (*defvar segment-pvar (zerop!! (mod!! (self-address!!) 4))) SEGMENT-PVAR > (*setf (pref segment-pvar 1) T) ; set processor 1 as well > (ppp (self-address!!) :end 20 :format ~2D ) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 > (ppp segment-pvar :end 20 :format ~:[.~;T~] ) T T . . T . . . T . . . T . . . T . . . > (ppp (segmented-news!! (self-address!!) segment-pvar) :end 20 :format ~2D ) 0 3 1 2 7 4 5 6 11 8 9 10 15 12 13 14 19 16 17 18 FOR PERFECTIONISTS: As defined above, the segmented-news!! function calculates an end-segment-pvar every time it is called. Calling segmented-news!! repeatedly (for instance, within a loop) would be very inefficient. A better method is to rewrite this function to accept an end-segment-pvar argument, and then precalculate the end-segment pvar before calling segmented-news!!. And even if you're not using scans within a loop, precomputing the start-segment and end-segment pvars is a very practical method for using scans, because (as shown in the segmented-news!! example) it makes it easy for you to use both "forward" and "backward" scanning in your functions. 6.3 Fit the Third: Using the Router Efficiently Router communication, while slower on average than grid communi- cation, is nevertheless a very flexible way to move data between processors. There are many cases in which it might seem at first that using router communication is highly inefficient, yet by a suitable rearrangement of data even these extreme cases can be handled effectively. As an example, consider the "many collisions" get operation. This happens when the address pvar argument of a pref!! call causes many processors to request a value from a single address, produc- ing many "collisions" of requests. The router has built-in hardware for handling these collisions effectively, but when the number of requests is very large the hardware algorithm can run out of memory. In these cases, a slower software algorithm is used instead to perform the get operation. By using a *defstruct pvar, a sort and a scan!! or two, we can simulate this algorithm, and see what steps the router has to take to handle a "many collisions" get. The *defstruct pvar is used to hold information about the data transfers that we make: > (*defstruct gmc-data (fetch-address nil :type fixnum) (originating-address nil :type fixnum)) For each processor, the fetch-address slot of this pvar records the address from which the processor is requesting a value, and the originating-address slot records the self address of the pro- cessor itself, so that the data can be sent back to it. The function we write will need to do the following: o Create a gmc-data pvar containing the fetch-add ress and originating-address for each active processor. o Sort the values of this pvar by fetch-address, to gather to- gether requests for data from the same fetch-address, and at the same time "pack" the requests together into low-address processors so that there are no gaps between them. o Find the first processor in each group of similar requests, and do a "get" (pref!!) operation to retrieve values for just those processors. o Do a scan!! to copy the retrieved values to all other pro- cessors in each group of requests. o Finally, do a "send" (*pset) operation to deliver the re- trieved values to the processors that requested them. The entire algorithm can be written as a single function: > (defun pref!!-many-collisions (data address) (let ((number-of-active-processors (*sum 1))) (*let ((sort-data (make-gmc-data!! :fetch-address address :originating-address (self-address!!)))) ; Sort sort-data pvar by fetch-address and ; pack into low-address processors so it is contiguous (*pset :no-collisions sort-data sort-data (rank!! (gmc-data-fetch-address!! sort-data) '<=!!)) ; Select processors that contain data after the sort (*all (*when ( (*defvar collision-addresses (floor!! (self-address!!) 6)) COLLISION-ADDRESSES > (ppp collision-addresses :end 28) 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 > (ppp (pref!!-many-collisions (*!! (self-address!!) 3) collision-addresses) :end 28) 0 0 0 0 0 0 3 3 3 3 3 3 6 6 6 6 6 6 9 9 9 9 9 9 12 12 12 12 There are two scan!! calls in pref!!-many-collisions, each having a different set of arguments: o The first scan!! call does a forward max!! scan of the sorted sort-data values, with :include-self nil. This causes each processor to compare its own value with the maximum value of all the preceding processors. Only processors that start a segment of similarly addressed requests will see a difference between these values, so this step quickly lo- cates the first processor in each segment. o The second scan!! call is a forward copy!! scan using the segment pvar derived from the first scan!!. The first pro- cessor in each segment has retrieved a value, so this scan!! operation copies that value to the other processors of the segment. Notice how in this example we're using one type of scan!! call to determine the bounds for another type of scan!! call. It's the multi-faceted nature of scan!! that makes this possible. Notice also the combination of *pset and rank!! at the beginning of the pref!!-many-collisions function. This is an important ex- ample of a pair of *Lisp functions working in tandem to perform more than one operation. The rank!! function provides a ranking of the fetch-address values in the sort-data pvar, and *pset uses this ranking to sort the values in ascending order. The important point to notice, however, is that some processors might be deselected when pref!!-many-collisions function is called. When rank!! does its ranking, these processors are simply ignored. The result is that *pset "packs" all the active values of sort-data into processors with low addresses, eliminating any "gaps" caused by deselected processors. FOR THE CURIOUS: This "packing" of pvar values is a useful trick for elim- inating gaps in your data set. We can rewrite the *pset- rank!! combination used in pref!!-many-collisions to "pack" the values of a pvar without changing their order, as in: (*pset :no-collisions pvar pvar (rank!! (self-address!!) '<=!!)) The expression (rank!! (self-address!!) '<=!!) is an impor- tant enough idiom that there is a *Lisp function that does the same thing: enumerate!!. So this "pack" operation can be written more efficiently as: (*pset :no-collisions pvar pvar (enumerate!!)) And, by the way, it shouldn't suprise you to learn that scan!! can be used to write a simple version of enumerate!!: (defun enumerate!! () (scan!! 1 '+!! :include-self nil)) 6.4 Fit the Fourth: Creating a "List" Pvar While *Lisp does not have a direct parallel analogue of the list data structure, you can use *Lisp operations to define a parallel data structure that has much of the flexibility of lists. This example uses VP sets, router communication, and scanning to de- fine a parallel data structure with an arbitrary number of ele- ments per processor, which I call a "list" pvar on account of its flexible nature. (Lisp purists will most likely prefer some other name.) This is accomplished by defining a new VP set that has sufficient processors to hold all the elements of the list pvar, and by de- fining the list pvar in such a way that it is divided into seg- ments of elements, one segment for each processor in the original VP set. In this way, each processor in the original VP set is as- signed a "list" of elements for its use, of any required length. NOTE: This section introduces some *Lisp functions and variables that we haven't looked at in previous sections. In particu- lar, this section introduces: *minimum-size-for-vp-set* allocate-processors-for-vp-set deallocate-processors-for-vp-set next-power-of-two->= There are definitions and examples of each of these opera- tions in the *Lisp Dictionary. Here's the definition of the VP set: > (def-vp-set list-vp-set nil :*defvars ((segment-pvar nil) (elements nil))) It is defined as a flexible VP set, meaning that its size and shape are determined at run time. It has two associated pvars, an elements pvar that will contains the elements of the lists we de- fine, and a segment-pvar (as used in the scan!! examples above) that will indicate with a t value the beginning of each segment of list elements. We'll also want a pair of permanent pvars that describe the start location and length of the segment of elements assigned to each processor: > (*defvar *list-start-processor*) > (*defvar *number-of-elements*) Now, here is the macro that actually defines and allocates the list-vp-set for the duration of a body of *Lisp code: > (defmacro allocate-list-pvar (length value &body body) `(let ((total-processors-required (*sum ,length))) ;;; Allocate processors in list-vp-set for list elements (allocate-processors-for-vp-set list-vp-set (list (max *minimum-size-for-vp-set* (next-power-of-two->=="> total-processors-required)))) (*with-vp-set list-vp-set (*set segment-pvar nil elements nil)) ;;; Get send addresses of elements in that list-vp-set ;;; that will contain the first element of each segment (*let ((*list-start-processor* (scan!! ,length '+!! :include-self nil)) (*number-of-elements* ,length)) ;;; For processors that have requested a non-zero number ;;; of elements, send the initial values to the first ;;; element of the corresponding segments. (*when (plusp!! ,length) (*pset :no-collisions ,value elements *list-start-processor* :notify segment-pvar)) ;;; Use scan!! to copy the initial value to all elements ;;; in each segment. (*with-vp-set list-vp-set (*set elements (scan!! elements 'copy!! :segment-pvar segment-pvar))) ;;; Evaluate body forms with list-vp-set defined (progn ,@body ) ;;; Deallocate the list-vp-set so that it can be reused. (deallocate-processors-for-vp-set list-vp-set)))) Notice that the allocate-list-pvar macro simply defines the new VP set, but does not select it; the body of code enclosed by the macro is free to select the list-vp-set whenever it is needed. As an example of what you can do with a "list" pvar once you've defined it, here's a function that prints out the first n lists in the list-vp-set: > (defun print-lists (n) (dotimes (i n) (format t "~&( ") (let ((start-address (pref *list-start-processor* i)) (length (pref *number-of-elements* i))) (*with-vp-set list-vp-set (dotimes (i length) (format t "~D " (pref elements (+ start-address i)))))) (format t ")"))) Finally, here's a test function that allocates the list-vp-set with a random set of list lengths, calls ppp to display the first n lengths it assigned, and then calls print-lists to display the first n lists in the list-vp-set: > (defun test-lists (n) (*let ((lengths (random!! 8))) (ppp lengths :end n) (allocate-list-pvar lengths (self-address!!) (print-lists n)))) Here's what the output from test-lists looks like: > (test-lists 6) 4 3 5 4 3 2 ( 0 0 0 0 ) ( 1 1 1 ) ( 2 2 2 2 2 ) ( 3 3 3 3 ) ( 4 4 4 ) ( 5 5 ) > (test-lists 6) 2 5 5 1 0 4 ( 0 0 ) ( 1 1 1 1 1 ) ( 2 2 2 2 2 ) ( 3 ) ( ) ( 5 5 5 5 ) While it's not a simple matter to dynamically change the size of the list assigned to each processor, the "list" pvar has many of the advantages of the list data structure in Common Lisp. There are many applications where this ability to assign a "list" of elements to each original processor comes in handy. For exam- ple, in simulating the behavior of subatomic particles, a list pvar can be used to keep track of the particles that result from collisions or decay. Another example is in parallel graphics programs, where a number of polygons must be rendered (displayed on a screen). Each po- lygon can be broken down into triangles that are easy to render, but each polygon may produce a different number of triangles. A list pvar can be used to hold the set of triangles derived from each polygon, and then the entire list pvar can be passed to a simple triangle-rendering algorithm for display. 6.5 Summary: There's More Than One Way to Scan a Pvar In this chapter, we've seen that: o You can use scan!! to add extremely large integers. o You can use the segmented version of scan!! to define other segmented operations. o You can use scan!! along with *defstruct pvars, ranking tools, and communication operators to increase the efficien- cy of your data transfer operations. o You can use scan!! along with the existing parallel data structures of *Lisp to define new data structures that have the kind of flexibility that you need. In short, there are many interesting and unusual things you can do with *Lisp: the examples in this chapter are but a representa- tive sample. The best way to learn about these unusual tricks is by creating them yourself. Decide what kind of operation you wish to perform, and then combine *Lisp operators to implement it.