RFC0684: Commentary on procedure calling as a network protocol

Download in PDF format Download in text format






Network Working Group
RFC #684
NIC #32252
April 15,1975



    A Commentary on Procedure Calling as a Network Protocol

                        Richard Schantz

                           BBN-TENEX



Preface_______

This RFC is being issued as a first step in an attempt to  stimulate
a dialog on some issues in designing a distributed computing system.
In particular, it considers the approach taken in a design set forth
in  RFC #674, commonly known as the "Procedure Call Protocol" (PCP).
In the present document, the concentration is on what we believe  to
be the shortcomings of such a design approach.

Note at the outset that this is not the first time we are  providing
a critical commentary on PCP.  During the earlier PCP design stages,
we met with the PCP designers for  a  brief  period,  and  suggested
several  changes,  many  of  which became part of PCP Version 2.  We
hasten to add, however, that the nature of  those  suggestions  stem
from  an entirely different point of view than those presented here.
Our original suggestions, and also some subsequent ones, were mainly
addressing  details  of implementation.  In this note the concern is
more with the concepts underlying the PCP design than with  the  PCP
implementation.

This note is being  distributed  because  we  feel  that  it  raises
certain  issues  which  have not been adequately addressed yet.  The
PCP designers are to  be  congratulated  for  providing  a  detailed
written  description  of  their  ideas,  thereby  creating a natural
starting  point  for  a  discussion  of  distributed  system  design
concepts.  It is the intent of this note to stimulate an interaction
among individuals involved with distributed computing,  which  could
perhaps  result in systems whose designs don't preclude their use in
projects  other  than  the  one  for  which  they  were   originally
conceived.

The ideas  expressed  in  this  RFC  have  benefited  from  numerous
discussions with Bob Thomas, BBN-TENEX, who shares the point of view
taken.

          A COMMENTARY on PROCEDURE CALLING         Page   2



Introduction____________


     While the Procedure Call Protocol (PCP) and its use within  the
National  Software  Works (NSW) context attacks many of the problems
associated with integrating independent computing systems to  handle
a  distributed  computation,  it  is  our  feeling  that  its design
contains flaws which should prevent its widespread use, and  in  our
view,  limit  its overall utility.  We are not voicing our objection
to the use of PCP, in its current  definition,  as  the  base  level
implementation  vehicle for the NSW project.  It is already too late
for any such objection, and PCP may, in fact, be very effective  for
the  NSW  implementation,  since they are proceeding in parallel and
have probably influenced each other.   Rather,  we  are  voicing  an
objection  to  the  "PCP philosophy", in the hope of preventing this
type of protocol from becoming the  de-facto  network  standard  for
distributed  computation,  and in the hope of influencing the future
direction of this and similar efforts.

     Some of the objectionable aspects of PCP, it can be argued, are
differences  of  individual  preference, and philosophers have often
indicated that you cannot argue about  tastes.   We  have  tried  to
avoid  such  arguments in this document.  Rather, we consider PCP in
light  of  our  experience  in   developing   distributed   systems.
Considered  in  this  way,  we  feel  that  PCP  and  its underlying
philosophy have flaws which  make  it  inappropriate  as  a  general
purpose protocol and virtual programming system for the construction
of distributed software systems.  It is  our  opinion  that  PCP  is
probably  complete  in  the  sense that one can probably do anything
that is required using its primitives.  A key  issue  then,  is  not
whether this function or that function can be supported.  Rather, to
us an important question is how easy it is to do  the  things  which
experience has indicated are important to distributed computing.  In
addition, a programming discipline dedicated to network applications
should  pay  particular  attention  to  coercing its users away from
actions which systems programming in general and network programming
in particular have shown to be pitfalls in system implementation.


A Point of View_ _____ __ ____


     At the outset, we fully support the aspects of the  PCP  design
effort  that  have  gone  into  systematizing  the  interaction  and
agreements between distributed  elements  to  support  inter-machine
computing.   This  includes  the  definition of the various types of
replies, the  standardization  of  the  data  structure  format  for
inter-machine  exchange,  and  the process creation primitives which
extend the machine boundaries.  Such notions are basic and  must  be
part  of any distributed system definition.  Our main concern is not
with these efforts.

          A COMMENTARY on PROCEDURE CALLING         Page   3



     Rather, we take exception to PCP's underlying premise: that the
procedure  calling  discipline  is  the  starting point for building
multi-computer systems.  This premise leads to a model which  has  a
central  point  for the entire algorithm control, rather than a more
natural (in network situations) distributed control accomplished  by
cooperating   independent   entities   interacting   through  common
communication paths.  While the procedure call may be an appropriate
basis  for  certain  applications,  we  believe  that it can neither
directly  nor  accurately  model  the   interactions   and   control
structures that occur in many distributed multi-computer systems.

     Much of what follows may seem to be a pedagogic  argument,  and
PCP supporters may take the position of "who cares what you call it,
its doing the same thing".  Our reply is that it is  very  important
to achieve a clear and concise model of distributed computation, and
while the PCP  model  does  not  require  "poor  implementation"  of
distributed  systems, neither does it make "good implementation" any
easier, nor does it prohibit ill-advised programming  practices.   A
model  stressing the dynamic interconnection of somewhat independent
computing  entities,  we  feel,  adheres  more  to  the  notions  of
defensive  programming,  which  we  have  found to be fundamental to
building usable multi-machine implementations.

     The rest of this RFC discusses what we feel to be some  of  the
shortcomings of a procedure call protocol.


Limitations of Procedure Calling Across Machines___________ __ _________ _______ ______ ________


     First and foremost, it is our contention that procedure calling
should  not  be  the  basis for multi-machine interactions.  We feel
that a request and reply protocol along  with  suitably  manipulated
communication paths between processes forms a model better suited to
the situation  in  which  the  network  places  us.   In  a  network
environment  one has autonomous computing entities which have agreed
on their cooperation, rather than a master process forcing execution
of a certain body of code to fulfill its computing needs.  In such a
configuration, actions required of a process are  best  accommodated
indirectly (by request) rather than directly (by procedure call), in
order to maintain the integrity of the constituent processes.

     Procedure calling is most  often  a  very  primitive  operation
whose   implementation   often   requires   only  a  single  machine
instruction.  In addition, it is usually true that procedure calling
is  usually  not  within  the  domain of the operating system.  [The
Multics intersegment procedure  calling  mechanism  may  present  an
exception  to  this,  until  linkage is complete.  In the remote PCP
case, however, linkage  can  never  be  complete  in  the  sense  of
supporting  a  fast transfer of control between modules].  Processes
and communication paths between processes, however,  are  undeniably
operating   system   constructs.   In  an  environment  where  local
procedure calling was "cheap", it would be ill-advised to  blur  the

          A COMMENTARY on PROCEDURE CALLING         Page   4



distinction  between  a local (inexpensive in time and effort) and a
remote procedure call, which obviously  requires  a  great  deal  of
effort  by  the "PCP system", if not by the PCP user.  It also seems
to  be  the  case  that  the  cost  of  blurring  the   local/remote
distinction  at  the  procedure call level will be found in the more
frequent use of a less efficient local procedure calling  mechanism.
Interprocess communication, on the other hand, (at least with regard
to stream or  message  oriented  channels  and  not  just  interrupt
signals)   is  generally  regarded  as  having  a  significant  cost
associated with it.   Message  sending  is  always  an  interprocess
action,  and  requires  system intervention always.  There is not as
substantial a difference between the IPC of local processes and  the
IPC  of  remote  processes,  as  between  local and remote procedure
calling.  PCP is suggestive of a model in which processes exist that
span machine boundaries to provide inter-machine subroutine calling.
Yet the PCP documentation has not advocated the notion of a  process
that  spans  machine  boundaries,  and  rightfully  so  since such a
creation would cause innumerable problems.  Since procedure  calling
is more suitable as an intra-process notion, it seems to be a better
idea to take the interprocess communication framework and extend  it
to  have  a uniform interpretation locally and remotely, rather than
to extend the procedure calling model.  It is  also  our  contention
that  a  model  which relies on procedure calling for its basis does
not take into account the special nature of the network environment,
and  that  such  an  environment  can  be more suitably handled in a
message passing model.  Furthermore, we feel that programming  as  a
whole,  even  purely  local computing, will benefit from paying more
attention to such areas as reliability and  robustness,  which  have
been  brought to the forefront through experience with an oftentimes
unreliable network and  collection  of  hosts.   An  IPC  model,  by
emphasizing  the  connections  between  disjoint processes, seems to
reinforce the idea that distributed  computing  is  accomplished  by
joining  separate entities, and that defensive programming and error
handling techniques are appropriate.  Since PCP is,  we  think,  for
distributed  system  builders,  and  not  for  the end user (e.g. an
RSEXEC user), avoiding  the  network,  interconnection  issues,  and
relative  costs, may be counter-productive if the goal is to achieve
usable network systems.

     In a similar vein, the entire notion of inter-machine procedure
calling  underlies  a model which in effect has extended the address
space of a single process.  That is, there  is  a  single  locus  of
algorithm   control   (although   perhaps  not  a  single  locus  of
execution).  While this model may well serve the needs of a  "local"
computation  where  the  parts  are  strongly  bound  together,  our
experience in building working distributed  systems  has  shown  the
utility of a model which has multiple loci of control and execution.
In such a model, it is through agreements on the method and type  of
information  interchange  and synchronization, that a computation is
carried out, rather than at the  singular  direction  of  a  central
entity.   In  a model that has distributed control and execution, we
feel a process will be in a better position to naturally  cope  with
the many vagaries that necessarily arise in a network environment.

          A COMMENTARY on PROCEDURE CALLING         Page   5



     The  unmistakable  trend  in  systems  programming  is   toward
inviolable    (protected)    process    structures   with   external
synchronization as a means of coping with  complex  debugging  tasks
and  the  difficulty of making system changes.  This trend is better
supported, we feel, by a message passing rather  than  a  procedural
model of computation.  Furthermore, we feel that network programming
techniques should be applied to local computation, not the other way
around.


Some Particulars____ ___________


     In the following list, we try to be more specific with  respect
to  particular situations where we think the PCP concept may be weak
as the basis for a network programming system.  For  some  of  these
examples to be meaningful, the reader should be fairly familiar with
the PCP documents issued as RFC 674.

       1.  Recovery  from  component  malfunction  may  be  very
    difficult  to  handle  by  a process that is not the central
    control (i.e.  a  process  which  is  being  manipulated  by
    having  its  procedures  executed).   Is the situation where
    there is network trouble, for example, to be  modeled  by  a
    forced procedure call to some error recovery routine?  It is
    precisely such situations where distributed  control  serves
    as  a  better  model.   Consider  the  act of introducing an
    inferior to another acquaintance and then supplying the  new
    handle  as a parameter of a subsequent procedure call in the
    inferior.  The inferior's blind  use  of  the  parameter  to
    interact with the other process illustrates the manipulative
    aspects of a superior.  The inferior never really  is  aware
    of  a new communication path to a new process.  The inferior
    environment (as maintained by the  PCP  "system")  has  been
    changed  by the superior, with no active notification of the
    inferior.  Certainly this makes user  coded  error  recovery
    somewhat awkward.

       2.  Such process manipulation may at  times  violate  the
    principles  of  modular programming.  In this vein, it seems
    beneficial to be able to debug separately the  pieces  of  a
    computation  and then worry only about their synchronization
    to achieve a totally  debugged  system.   With  PCP  in  its
    fullest sense, the danger of error propagation seems greater
    because of the power of a process to cause execution  of  an
    arbitrary  procedure  and  to  read/write remote data stores
    without the active participation of the remote process.

       3.  Can we assume a proper initialization sequence if our
    procedures   are  called  remotely?   Must  every  procedure
    contain the code to check  for  the  propriety  and  correct
    sequencing of the call? A model in which each remote process
    is  an  active  computing  element  seems  better  able   to

          A COMMENTARY on PROCEDURE CALLING         Page   6



    conveniently apply protective standards to the code and data
    it encompasses.

       4.  PCP doesn't model long term parallel  activity  in  a
    convenient   fashion,  as  is  required  to  handle  various
    asynchronous producer/consumer process  relationships.   The
    synchronization  is  geared  more  to  a one-to-one call and
    return, rather than to the asynchronous nature and  multiple
    returns  for  a single request, as exhibited by many network
    services.  In addition, low priority, preemptable background
    tasks  are  hard  (impossible?) to model in a procedure call
    environment.

       5.  Communication  paths  are  not  treated  as  abstract
    objects  which are independent from the actual entities they
    connect, and hence they cannot be utilized  in  some  useful
    ways (e.g. to carry non PCP messages).  Also with respect to
    treating communication paths as objects, there is no concept
    of  passing  a  communication  path  to  an  inferior (or an
    acquaintance), without having to create a  new  "connection"
    (whether  or  not  this turns out to be a physical channel).
    The ability to pass communication paths is often  useful  in
    subcontracting  requests  to inferior processes.  To do this
    within PCP requires the cooperation of the  calling  process
    (i.e. to  use  the new connection handle), which again seems
    to  violate  the  concepts  of  modular  programming.    The
    alternative  approach  in  PCP is to have the superior relay
    the subsequent communications to its created  inferior,  but
    the  effort involved would probably prohibit the use of this
    technique for subcontracting.

       6.  PCP seems too complicated to be used for the type  of
    processing  which  requires  periodic but short (i.e.  a few
    words  exchanged)  interactions.    An   example   of   such
    interactions  is  the  way the TIP uses the TENEX accounting
    servers (see RFC #672).  Furthermore, PCP is  probably  much
    too  complex  for  implementation  on a small host.  In that
    regard, there does not seem to be a definition of what might
    constitute a minimum implementation for a host/process which
    did/could not handle all of what has been developed.

       7.  In the PCP model, it may become awkward  or  resource
    consuming  for  a service program to do such things as queue
    operations for execution at a later time (persistence) or at
    a  more opportune time (priority servicing mechanism).  Such
    implementations may require dummy returns  and  modification
    of   the   controlling   fork  concept,  or  maintenance  of
    processing forks over long periods of inactivity.

       8.  It is not  always  true  that  a  process  connecting
    (splicing)  to  a  service  should  be able to influence the
    service process environment in any direct way.   How  can  a
    service process in PCP prevent a malicious user fom splicing

          A COMMENTARY on PROCEDURE CALLING         Page   7



    to it and then introducing it  to  an  arbitrary  number  of
    processes,  thereby  overflowing  the  table  space  in that
    process.  All of that could  have  been  done  without  ever
    executing  a  single instruction of user written code.  This
    difficulty is a consequence of the PCP notion of having  one
    process  manipulate  the  environment of another without its
    active participation in such actions.

       9.   Doesn't  the  fact  that  the  network  PCP  process
    implementation  is so much neater than the TENEX PCP process
    implementation (since  TENEX  doesn't  have  a  general  IPC
    facility)  suggest  that  message  passing and communication
    facilities supported by the "system" provides a sound  basis
    for  multi-process  implementations,  and  that perhaps such
    facilities  should   be   primitively   available   to   the
    distributed system builders who will use PCP?

       10.   There  is  a  question  of  whether   PCP   is   an
    implementation virtual machine (language), or an application
    virtual machine (language).  That is, is PCP intended to  be
    used   to   implement   systems   which  manage  distributed
    resources, or as an end  product  which  makes  the  network
    resources  themselves  easier  to  use  for  the  every day,
    ordinary  programmer  (e.g.   makes   the   network   itself
    transparent  to  users).   One  gets  the  feeling  that the
    designers had both goals, and that neither one is completely
    satisfied.   If  the  former  goal is taken, we believe that
    most of the  complexities  (e.g.   network  trouble,  broken
    connections,   etc.)   and  possibilities  (e.g.   redundant
    implementation,  broadcast   request,   etc.)   of   network
    implementations  are  not  provided for adequately.  In this
    view,  the  NSW  framework  (Works  manager,  FE)   is   the
    distributed  system  that  utilizes  the  PCP implementation
    language.  We do not see how the use of PCP in this  context
    provides   for   either  an  extra-reliable  system  through
    component redundancy,  or  a  persistent  system  which  can
    tolerate  temporary malfunctions.  If one subscribes to this
    view, then it doesn't seem right that the objects  that  run
    under the created system (i.e.  the tools that run under the
    PCP implemented Front End, Works Manager, and  TBH  monitor)
    should  also  be  aware of or use PCP.  If one considers the
    latter goal, that PCP implements a  virtual  machine  to  be
    presented   to   all   programmers  for  making  distributed
    resources easy to use, then it is clear that  PCP  with  its
    manifest  concern  for  object location does not provide for
    the desireable properties of network transparency.

Our conclusion is that procedure  calling  is  not  the  appropriate
basis  for distributed multi-computer systems because it can neither
directly nor accurately model  the  network  environment.   The  PCP
virtual  programming  system may be inadequate for implementing many
distributed  systems  because  the  complexities  and  possibilities
unique to the network environment are not provided for at this basic

          A COMMENTARY on PROCEDURE CALLING         Page   8



level.