<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc0000.dtd">

<?rfc iprnotified="no" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc toc="yes" ?>
<?rfc compact="no"?>
<?rfc subcompact="yes"?>

<rfc ipr="full3978" docName="draft-myklebust-nfsv4-byte-range-delegations-00">
	<front>
		<title abbrev="NFSv4 byte range delegations">
			Network File System (NFS) version 4 byte range delegations
		</title>
		<author initials="T." surname="Myklebust"
			fullname="Trond Myklebust">
			<organization>
				Network Appliance, Inc.
			</organization>
			<address>
				<postal>
					<street>535 W. William St., Suite 3100</street>
					<city>Ann Arbor</city> <region>MI</region>
					<code>48103</code>
					<country>US</country>
				</postal>
				<phone>+1 734-764-5207</phone>
				<email>Trond.Myklebust@netapp.com</email>
			</address>
		</author>
		<author initials="J." surname="Fields"
			fullname="J. Bruce Fields">
			<organization abbrev="CITI">
				U. of Michigan Center for Information Technology Integration
			</organization>
			<address>
				<postal>
					<street>535 W. William St., Suite 3100</street>
					<city>Ann Arbor</city> <region>MI</region>
					<code>48103</code>
					<country>US</country>
				</postal>
				<email>bfields@citi.umich.edu</email>
			</address>
		</author>
		<author initials="W.A." surname="Adamson"
			fullname="William A. Adamson">
			<organization abbrev="CITI">
				U. of Michigan Center for Information Technology Integration
			</organization>
			<address>
				<postal>
					<street>535 W. William St., Suite 3100</street>
					<city>Ann Arbor</city> <region>MI</region>
					<code>48103</code>
					<country>US</country>
				</postal>
				<email>andros@citi.umich.edu</email>
			</address>
		</author>
		<author initials="P." surname="Honeyman"
			fullname="Peter Honeyman">
			<organization abbrev="CITI">
				U. of Michigan Center for Information Technology Integration
			</organization>
			<address>
				<postal>
					<street>535 W. William St., Suite 3100</street>
					<city>Ann Arbor</city> <region>MI</region>
					<code>48103</code>
					<country>US</country>
				</postal>
				<email>honey@citi.umich.edu</email>
			</address>
		</author>
		<date month="June" year="2005" />
		<area>Transport</area>
		<workgroup>Network File System Version 4</workgroup>
		<keyword>RFC</keyword>
		<keyword>Request for Comments</keyword>
		<keyword>NFSv4</keyword>
		<keyword>Network File System</keyword>
		<keyword>Byte range delegations</keyword>
		<abstract>
			<t>
				This document describes a
				set of extensions to the NFS version 4 protocol
				that enable the client to cache file data when
				caching conflicts prevent the server from
				handing out a file delegation.
			</t>
			<t>
				The proposed extensions enable the caching of
				only those specific byte ranges of data which
				the user application is reading or writing.
			</t>
			<t>
				As in the case of full delegations, a callback
				mechanism enables the server to request that
				the client flush cached data when a caching
				conflict occurs.
			</t>
		</abstract>
		<note title="Keywords">
			<t>
				The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
				"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
				and "OPTIONAL" in this document are to be interpreted as
				described in <xref target="RFC2119"/>.
			</t>
		</note>
	</front>
	<middle>
		<section title="Introduction">
			<t>
			</t>
			<section title="File caching in NFS versions 2 and 3">
				<t>
					The NFS protocol versions 2 and 3 do not offer any
					caching guarantees to clients. The most commonly
					implemented caching model is the so-called
					close-to-open model, which relies on user
					applications providing their own assurances of
					exclusive access to file data. In this model,
					the clients limit themselves to checking cache
					consistency when the user opens and closes the file.
					In the case where the NLM locking extensions are
					implemented, checks are also performed upon taking
					and releasing advisory locks.
				</t>
			</section>
			<section title="File caching in NFS version 4">
				<t>
					With the introduction of delegations, NFS
					version 4 <xref target="RFC3530"/> strengthens
					file caching guarantees at the protocol
					level under limited circumstances that mirror
					those under which the close-to-open model is valid.
				</t>
				<t>
					When the client opens a file for reading,
					the server is permitted to offer a "file read delegation"
					after having determined that no other clients have been
					granted write access. This is a guarantee that the file
					data and meta-data will not change until the client gives
					up the delegation. A file read delegation also gives
					the client the opportunity to cache byte range read
					locks and READ open share locks.
				</t>
				<t>
					When the client opens a file with
					READ or WRITE share semantics, and the server
					determines that the client is the exclusive
					user of that file, it may offer a
					"file write delegation". In doing so it
					guarantees that no other client may read or
					modify the file until the delegation is
					returned. A write delegation also enables the
					caching of all byte range locks and open
					share locks.
				</t>
				<t>
					The key difference in functionality between a
					file delegation and a lock lies in the fact that the
					server is able to recall the delegation at any
					time by means of a callback channel. When
					a delegation is recalled, the client is
					expected to flush its cache, establish its cached locks
					on the server, and return the delegation, and to do
					all this as quickly as possible.
					If the server notes that the client has failed to
					return the delegation within a grace time of 1 lease
					period, then the server may unilaterally revoke the
					delegation.
				</t>
			</section>
			<section title="Motivation for extending the NFSv4 delegation model">
				<t>
					Problems arise when multiple clients wish to
					access the file, and one (or more) has open for writing.
					Delegations are ruled out for this case, so unless an
					application uses byte range locking, a client is unable
					to tell whether cached data is valid. Perforce, clients
					fall back to not caching data or checking cache validity
					frequently, increasing the I/O burden on the server.
				</t>
				<t>
					One long-standing problem that the NFSv4 delegation model therefore
					fails to solve is that of providing cache consistency guarantees
					as strong as those provided by local file-systems.
					This failure has a broad impact, e.g. it interferes with porting
					applications from a single machine environment to a cluster of
					machines that share files with NFS.
				</t>
				<t>
					Among the applications that require stronger caching semantics than
					NFSv4 provides are those that use shared memory mapped
					files for synchronisation and communication between processes on
					different clients but do no supplementary locking. Another example
					is shared append-only files such as logs.
				</t>
				<t>
					Even applications that use byte range locking for synchronisation
					are affected. Unless a peek at the change attribute shows that
					no-one has written to the file anywhere in the file,
					a client may be forced to ignore otherwise valid cached data.
				</t>
			</section>
		</section>
		<section title="Description of the proposed caching model">
			<t>
				Except for the special case of the size attribute, this
				document does not address the issue of
				file meta-data consistency.
			</t>
			<t>
				The proposed model resembles that of file delegations in that the client
				can register with the server to provide synchronous notification
				of changes to locks and cached data. It also provides synchronisation
				guarantees between writers by allowing them to request temporary
				exclusive access to byte ranges of the file.
			</t>
			<t>
				The model is required to operate consistently in a mixed environment
				in which some clients may be using older versions of the NFS
				protocol together with uncached I/O. To the older clients, those
				that are using byte range delegations should appear to behave as if
				they too are using uncached I/O.
			</t>
			<section title="File data">
				<section title="Read delegations">
					<t>
						A server that grants a read delegation on a byte range
						guarantees that no other client may change the
						data or acquire a write-lock in the covered region
						until the delegation is released. Note that a SETATTR
						that modifies the size of a file effectively changes
						the data in the region between the old and new sizes.
					</t>
					<t>
						The client may request a read delegation on a byte
						range using the DELEG_RANGE operation with
						a lock type argument of READ_LT or READW_LT. In the case
						where the READ_LT argument is used, the DELEG_RANGE call
						should fail without triggering a recall if another client
						holds a write delegation for that range. Clients can use
						this mechanism in order to issue speculative
						requests that might fail, e.g. read-ahead requests.
						The server MUST, however initiate the
						recall of any conflicting write delegation when the
						READW_LT variant is used whether or not the request
						is granted.
					</t>
					<t>
						In the proposed model, if a current delegation
						stateid has been set using a previous DELEG_PUT_STATEID or
						DELEG_RANGE operation, then a READ request implicitly requests
						a read delegation on the byte range covered by its arguments.
						In this case, the server should treat the READ request as
						if it has been immediately preceded by a DELEG_RANGE call with
						a READW_LT argument.
					</t>
					<t>
						A server MUST refuse to grant a read delegation on a range that
						would overlap with a write delegation held by another client.
						In order to allow the caching of byte range locks, the server
						MUST also refuse to grant a read delegation for
						a range that overlaps with a WRITE lock
						held by another client.
					</t>
					<t>
						If another client attempts to write into the region
						covered by the delegation, the server should initiate
						an immediate recall. It may then optionally return
						an error of NFS4ERR_DELAY to the write request.
					</t>
				</section>
				<section title="Write delegations">
					<t>
						A server that grants a write delegation on a byte range
						guarantees that no other client may change the data in that
						region until the delegation has been released. In addition,
						it guarantees that no other client may read data or hold
						a read delegation in that region until the write delegation
						has been downgraded or released.
					</t>
					<t>
						The client may request a write delegation on a byte
						range using the DELEG_RANGE operation with
						a lock type argument of WRITE_LT or WRITEW_LT. In the case
						where the WRITE_LT argument is used, the DELEG_RANGE call
						should fail without triggering a recall if another client
						holds a read or write delegation for that range.
						The server MUST, however initiate the
						recall of any conflicting read or write delegation when the
						WRITEW_LT variant is used.
					</t>
					<t>
						A server MUST refuse to grant a write delegation that
						would overlap with a read or write delegation held by another
						client.
						In order to allow the caching of byte range locks, the server
						MUST also refuse to grant a write delegation for
						a range that overlaps with a READ or WRITE lock
						held by another client.
					</t>
					<t>
						To avoid lock starvation for write
						delegations, the server is encouraged to implement the same
						queueing scheme that is described for byte range locks in
						Section 8.4 of <xref target="RFC3530"/>.
					</t>
				</section>
			</section>
			<section title="Upgrading and downgrading byte ranges">
				<t>
					In the proposed mode, a client may request to upgrade a read
					delegation to a write delegation at any time using the DELEG_RANGE
					operation. If successful, the upgrade must be performed atomically
					by the server so that the client that requested the upgrade can
					keep any cached data.
				</t>
				<t>
					Similarly, a client that is holding a write delegation on a byte
					range may, once it is done flushing out any dirty data,
					request that the server atomically downgrade it to a read delegation
					using the DELEG_DOWNGRADE operation. It is expected that clients
					will take advantage of this as part of a COMMIT compound to
					obviate recalls.
				</t>
			</section>
			<section title="File truncation and extension">
				<t>
					Changes to the file size MUST trigger a recall of all byte range
					delegations held by other clients in the region between the old
					and new end of file.
				</t>
				<t>
					A useful consequence of this rule is that a client wishing to
					be notified of changes to the size attribute may achieve
					this by requesting a read or write delegation that covers the
					2 byte range starting at the offset (size - 1).
				</t>
				<t>
					If a client holds a write delegation in the region of the
					end of file marker, then it is guaranteed that no other
					clients can append to the file until the client holding
					the write delegation has finished writing out its modifications
					and released the delegation in that region.
				</t>
			</section>
			<section title="Byte range locks">
				<t>
					A client holding a write delegation may cache read or write byte
					range lock requests, provided they are fully included in the
					range covered by the write delegation.
				</t>
				<t>
					A client holding a read delegation may cache read byte range
					lock requests provided they are fully included in the region
					covered by the read delegation.
				</t>
				<t>
					If a delegation is recalled or downgraded, the client is
					responsible for establishing any cached locks to the server as
					part of the process of recovery.
				</t>
			</section>
		</section>
		<section title="Stateids and byte range delegations">
			<t>
				One of the goals of the delegation model is
				to allow clients to cache data without having
				to tie that delegation to a particular open
				stateid. Although the DELEG_OPEN operation uses an
				open stateid and sequence to guarantee only-once
				semantics, the resulting stateid is not considered to be
				associated to this particular open stateid.
			</t>
			<t>
				To allow it to be reused with other open stateids, therefore,
				the byte range delegation stateid does not carry any share or
				lock information. A client holding
				a write delegation on a particular byte range
				has no guarantee that the share reservations
				on that file allow write access.
			</t>
			<section title="The current delegation stateid">
				<t>
					To allow the server to check that
					a given operation does not violate
					the requested caching semantics, we add the
					notion of a "current delegation stateid".
				</t>
				<t>
					Rather than replacing the usual open
					stateid argument, the current delegation stateid
					is set in a separate operation that precedes
					the READ, WRITE, or SETATTR operation that it
					protects. It is set either implicitly
					using a DELEG_RANGE operation, or by using the dedicated
					operation DELEG_PUT_STATEID. The current delegations
					stateid is automatically cleared by any operation that
					changes the current filehandle. It
					may also be cleared by explicitly calling DELEG_PUT_STATEID
					with a special stateid argument consisting of all zeros.
				</t>
				<t>
					If set, the current delegation stateid applies to all
					subsequent READ, WRITE and SETATTR operations within the
					same COMPOUND. The server is required to check the current
					delegation stateid in addition to the READ/WRITE/SETATTR's
					stateid argument, and should return NFS4ERR_OLD_STATEID if
					either stateid has been superseded due to a state change.
					This may, for instance occur in the case of a race with
					another DELEG_DOWNGRADE or DELEG_RELEASE request on the
					same file.
				</t>
			</section>
		</section>
		<section title="Callback model">
			<section title="Revocation">
				<t>
					Servers are permitted to recall a byte range delegation
					at any time and for any reason. Typical scenarios that
					trigger such a recall include:
				</t>
				<list style="symbols">
					<t>
						Resolving a caching conflict due to a request from 
						another client. Operations that may require a recall
						of the byte range delegation include READ, WRITE,
						LOCK, LOCKT, SETATTR, OPEN or DELEG_RANGE.
					</t>
					<t>
						Another client's read patterns triggers speculative
						read-ahead on the server.
					</t>
					<t>
						The amount of delegation state being managed by the server
						grows too large, triggering a reclaim of resources.
					</t>
				</list>
				<t>
					There are two ways for a server to recall a byte range delegation:
				</t>
				<list style="symbols">
					<t>
						As for file delegations, the server can use CB_RECALL to
						request that a client flush all writes and locks affected
						by the delegation, and return the delegation using the DELEGRETURN
						operation. If the client later wishes to re-establish
						a delegation, then it must first call DELEG_OPEN
						to obtain a new delegation stateid.
					</t>
					<t>
						The new CB_RECALL_RANGE allows the server finer granularity over
						which region of the file that it wishes to control.
						CB_RECALL_RANGE also allows the server to request a downgrade
						rather than a full recall of a region that holds cached
						writes. By requesting a downgrade, the server signals
						that the client may convert its write delegations into read
						delegations after it has finished flushing the cached writes
						to disk.
					</t>
				</list>
				<t>
					Clients that request byte range delegations MUST be able to handle
					both CB_RECALL and CB_RECALL_RANGE recall requests.
				</t>
			</section>
			<section title="Client recovery from a recalled byte range delegation">
				<t>
					When the server recalls a byte range or part of a byte range
					that has been delegated, the client recovery
					process is very similar to that of file delegation:
				</t>
				<list style="symbols">
					<t>
						If the client holds a read delegation on the recalled byte range,
						then it should recover any cached byte range read locks and
						mark the read cache as invalid.
					</t>
					<t>
						If a write delegation is held on all or part of the byte range
						being recalled, then the client should recover any cached
						read or write locks, flush out all pending writes,
						and mark the read cache as invalid.
					</t>
				</list>
				<t>
					The recovery process ends when the client returns the delegation
					on the recalled range using either the DELEG_RELEASE or
					DELEGRETURN operations.
				</t>
				<t>
					If the server requests a downgrade of a write delegation, then
					the client may optionally select to use a DELEG_DOWNGRADE instead
					of returning the entire delegation. If it chooses to do so then it
					need not mark the read cache as invalid on that range.
				</t>
			</section>
			<section title="Client recovery from a recalled file delegation">
				<t>
					If the server recalls a file write delegation, then the client
					may request read or write byte range delegations
					as part of the usual process of recovering cached locks and
					flushing out writes.
				</t>
				<t>
					The server is under no obligation to honour these requests,
					but it may choose to do so in order to allow the client
					to continue to cache read data or writes that are not
					causing any immediate cache consistency conflicts.
				</t>
				<t>
					Likewise, in the case where the server recalls a file read
					delegation, then the client may issue requests for byte range
					read delegations during the recovery phase.
				</t>
			</section>
			<section title="Use of CB_GETATTR for querying the size attribute">
				<t>
					If a client holds a write delegation that extends across
					the end of file, then it may cache SETATTR or WRITE
					operations that will cause the size attribute to change.
					Rather than recall the delegation when a second client
					attempts to query the size attribute, the server MAY
					choose to send a CB_GETATTR callback to the
					client holding the delegation in order to determine the
					true file size.
				</t>
				<t>
					Note that the server MUST NOT issue a CB_GETATTR query for
					any attributes other than size.
				</t>
			</section>
		</section>
		<section title="Crash recovery">
			<t>
				As usual under NFS, the recovery of byte range delegations after a crash
				is driven by clients.
			</t>
			<section title="Client reboot scenario">
				<t>
					If the client reboots using the standard calls to SETCLIENTID
					and SETCLIENTID_CONFIRM then the server is expected to clear
					the byte range delegations as part of the usual operation
					of breaking the lease state owned by the previous incarnation of
					the client.
				</t>
			</section>
			<section title="Server reboot scenario">
				<t>
					The client discovers a server reboot in the usual fashion by
					receiving a NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID.
					If the server supports a grace period, the client may then
					attempt to recover byte range delegations as part of the
					normal process of state recovery.
				</t>
				<t>
					During the grace period, the client recovers the byte range
					delegation by issuing requests with the reclaim flag set to
					true. The server guarantees that the file will not change
					in the usual fashion by rejecting any conflicting non-reclaim
					delegation, locking and OPEN requests, READ, WRITE, and SETATTR.
				</t>
			</section>
			<section title="Network partition">
				<t>
					If a network partition causes the client to fail to renew its leases
					within the usual lease expiration period, the server MAY choose to
					hold the byte range delegation on behalf of the client until a
					conflict forces a revocation. In the latter case, the server should
					return NFS4ERR_EXPIRED in response to any attempts to use the
					delegation.
				</t>
				<t>
					If the client sees that the change attribute on the file has
					not been modified, it may attempt to re-establish its byte range
					delegations by requesting a DELEG_OPEN, and then replaying the
					DELEG_RANGE requests to the server. The client should ensure
					that it revalidates its cache using the change attribute also
					after recovery is complete in order to make sure that the
					cache is still valid.
				</t>
				<t>
					The reader is referred to the section "Revocation Recovery for
					Write Open Delegation" in <xref target="RFC3530"/> for a
					discussion on how to deal with cached writes in regions where
					recovery of the byte range delegation has failed.
				</t>
			</section>
		</section>
		<section title="New client operations">
			<section title="DELEG_OPEN - request new byte-range delegation stateid">
				<figure><artwork>
SYNOPSIS

  (cfh), open_seqid, open_stateid, deleg_seqid -> stateid, delegation

ARGUMENT

  struct DELEG_OPEN4args {
          /* CURRENT_FH: opened file */
          seqid4             open_seqid;
          stateid4           open_stateid;
          seqid4             deleg_seqid;
  };

RESULT

  struct DELEG_OPEN4resok {
          stateid4           stateid;     /* byte range delegation */
          open_delegation4   delegation;  /* open delegation */
  };

  union DELEG_OPEN4res switch (nfsstat4 status) {
   case NFS4_OK:
           /* CURRENT_STATEID: Stateid for byte range delegation */
           DELEG_OPEN4resok   resok4;
   default:
           void;
  };
				</artwork></figure>
				<t>
					DESCRIPTION
				</t>
				<t>
					DELEG_OPEN requests a byte-range delegation 
					stateid for a given file. The open stateid and
					sequence id are used to ensure only-once semantics in
					the absence of sessions <xref target="draft-ietf-nfsv4-sess-01"/>.
					The delegation sequence identifier should be
					initialised to zero upon the first call to
					DELEG_OPEN for a given file and each time the user
					gives up the byte range delegation stateid.
				</t>
				<t>
					If the client attempts to call DELEG_OPEN using
					the special stateids consisting of all zero bits or
					all one bits, the server should deny the request
					using the error NFS4ERR_OPENMODE.
				</t>
				<t>
					The server is also required to deny this request with a
					NFS4ERR_CB_PATH_DOWN if the callback path cannot be
					established.
				</t>
				<t>
					On success, the current filehandle retains its value.
					The current delegation stateid is replaced with the
					stateid corresponding to the byte range delegation.
				</t>
				<t>
					IMPLEMENTATION
				</t>
				<t>
					The client gives up the byte range delegation stateid using
					the DELEGRETURN operation.
				</t>
				<t>
					At any given time there should be at most one byte-range
					delegation stateid in existence per (file, client) pair.
					A client is permitted to send multiple DELEG_OPEN requests,
					however the server should then reply with the same stateid.
				</t>
				<t>
					The server may additionally choose to grant the client an
					ordinary file delegation.
				</t>
				<t>
					ERRORS
				</t>
				<list style="empty">
					<t>NFS4ERR_ACCESS</t>
					<t>NFS4ERR_ADMIN_REVOKED</t>
					<t>NFS4ERR_BADHANDLE</t>
					<t>NFS4ERR_BAD_SEQID</t>
					<t>NFS4ERR_BAD_STATEID</t>
					<t>NFS4ERR_BADXDR</t>
					<t>NFS4ERR_CB_PATH_DOWN</t>
					<t>NFS4ERR_DELAY</t>
					<t>NFS4ERR_DENIED</t>
					<t>NFS4ERR_EXPIRED</t>
					<t>NFS4ERR_FHEXPIRED</t>
					<t>NFS4ERR_ISDIR</t>
					<t>NFS4ERR_LEASE_MOVED</t>
					<t>NFS4ERR_MOVED</t>
					<t>NFS4ERR_NOFILEHANDLE</t>
					<t>NFS4ERR_NOTSUPP</t>
					<t>NFS4ERR_OLD_STATEID</t>
					<t>NFS4ERR_OPENMODE</t>
					<t>NFS4ERR_RESOURCE</t>
					<t>NFS4ERR_SERVERFAULT</t>
					<t>NFS4ERR_STALE</t>
					<t>NFS4ERR_STALE_CLIENTID</t>
					<t>NFS4ERR_STALE_STATEID</t>
				</list>
			</section>
			<section title="DELEG_RANGE - extend delegation to cover a byte range">
				<figure><artwork>
SYNOPSIS

  (cfh), locktype, reclaim, stateid, offset, length ->
  (cstateid), offset, length, recall

ARGUMENT

  struct DELEG_RANGE4args {
          /* CURRENT_FH: file */
          nfs_lock_type4     locktype;
          bool               reclaim;
          stateid4           stateid;
          offset4            offset;
          length4            length;
  };

RESULT

  enum delegreturn4 {
         NORECALL            = 0,
         DOWNGRADE           = 1,
         RECALL              = 2
  };

  struct DELEG_RANGE4resok {
          offset4            offset;
          length4            length;
          delegreturn4       recall;
  };

  union DELEG_RANGE4res (nfsstat4 status) {
    case NFS4_OK:
          DELEG_RANGE4resok  resok4;
    default:
          void;
  };
				</artwork></figure>
				<t>
					DESCRIPTION
				</t>
				<t>
					The DELEG_RANGE operation requests a delegation for
					the byte range specified by the offset and length
					parameters. The locktype specifies the type of caching
					semantics that are requested. A reclaim request is
					signalled by setting the reclaim parameter to TRUE.
				</t>
				<t>
					If the locktype is set to READ_LT or WRITE_LT, and
					another client holds a conflicting delegation, the
					server should return NFS4ERR_DENIED. If, however
					locktype is either READW_LT or WRITEW_LT, the
					server should initiate a recall of all conflicting
					delegations prior to returning NFS4ERR_DENIED.
				</t>
				<t>
					If a client requests a locktype of WRITE_LT or
					WRITEW_LT on a region for which it already holds
					a read delegation, then the server should attempt
					to atomically upgrade the existing delegation.
					A server that does not support atomic upgrades or
					downgrades of the byte range delegation should
					return NFS4ERR_LOCK_NOTSUPP.
				</t>
				<t>
					On success, the server returns the range covered
					by the delegation. Note that the server may choose
					to extend the range requested by the client in order
					to decrease the administrative burden by merging
					noncontiguous delegation ranges. It MUST not,
					however, return a range that is smaller than that
					requested by the client.
				</t>
				<t>
					The "recall" flag is an optimisation that
					can be used by the server to notify the client that
					a conflicting request is already queued. If this
					flag is set to DOWNGRADE then the client should
					should downgrade the write delegation to a read
					delegation. If it is set to RECALL, then the client
					should release the delegation.
				</t>
				<t>
					On success the current filehandle retains its value,
					and the current delegation stateid is set to the new
					value.
				</t>
				<t>
					IMPLEMENTATION
				</t>
				<t>
					DELEG_RANGE may be called on a given stateid as many
					times as desired.  The server may represent the result
					bytes covered internally as a list of noncontiguous
					byte ranges.  Or, if it chooses, it may choose a
					simpler representation--for example, a single range
					covering all of the bytes ever requested. A server is
					is free to reject DELEG_RANGE requests and to recall
					them for any reason, so at worst, this might cause
					the server to deny requests (or recall
					delegations) more often than is strictly necessary.
				</t>
				<t>
					The READW_LT and WRITEW_LT lock types cause the server
					to recall any conflicting delegations from other
					clients. A client will want to use these variants in
					situations where strong cache consistency guarantees
					are needed.
				</t>
				<t>
					A length field with all bits one extends the
					delegation through the end of file, regardless of how
					long the file actually is.
				</t>
				<t>
					If mandatory file locking is on for the file, and
					if a lockowner on a client other than the one from
					which this DELEG_RANGE request originated holds
					a conflicting lock, then the server should return
					NFS4ERR_LOCKED.
				</t>
				<t>
					ERRORS
				</t>
				<list style="empty">
					<t>NFS4ERR_ACCESS</t>
					<t>NFS4ERR_ADMIN_REVOKED</t>
					<t>NFS4ERR_BADHANDLE</t>
					<t>NFS4ERR_BAD_RANGE</t>
					<t>NFS4ERR_BAD_STATEID</t>
					<t>NFS4ERR_BADXDR</t>
					<t>NFS4ERR_DELAY</t>
					<t>NFS4ERR_DENIED</t> 
					<t>NFS4ERR_EXPIRED</t>
					<t>NFS4ERR_FHEXPIRED</t>
					<t>NFS4ERR_GRACE</t>
					<t>NFS4ERR_INVAL</t>
					<t>NFS4ERR_ISDIR</t>
					<t>NFS4ERR_LEASE_MOVED</t>
					<t>NFS4ERR_LOCKED</t>
					<t>NFS4ERR_LOCK_NOTSUPP</t>
					<t>NFS4ERR_MOVED</t>
					<t>NFS4ERR_NOFILEHANDLE</t>
					<t>NFS4ERR_NO_GRACE</t>
					<t>NFS4ERR_NOTSUPP</t>
					<t>NFS4ERR_OLD_STATEID</t>
					<t>NFS4ERR_RECLAIM_BAD</t>
					<t>NFS4ERR_RECLAIM_CONFLICT</t>
					<t>NFS4ERR_RESOURCE</t>
					<t>NFS4ERR_SERVERFAULT</t>
					<t>NFS4ERR_STALE</t>
					<t>NFS4ERR_STALE_STATEID</t>
				</list>
			</section>
			<section title="DELEG_DOWNGRADE - downgrades a write delegation on a byte range">
				<figure><artwork>
SYNOPSIS

  (cfh), stateid, deleg_seqid, offset, length -> stateid, recall

ARGUMENT

  struct DELEG_DOWNGRADE4args {
          /* CURRENT_FH: file */
          stateid4           stateid;
          seqid4             deleg_seqid;
          offset4            offset;
          length4            length;
  };

RESULT

  struct DELEG_DOWNGRADE4resok {
          stateid4           stateid;
          bool               recall;
  };

  union DELEG_DOWNGRADE4res switch (nfsstat4 status) {
    case NFS4_OK:
            DELEG_DOWNGRADE4resok resok;
    default:
            void;
  };
				</artwork></figure>
				<t>
					DESCRIPTION
				</t>
				<t>
					DELEG_DOWNGRADE is used by the client
					to downgrade all write delegations held over
					a given byte range and convert them into read
					delegations.
				</t>
				<t>
					The server may piggyback a request to have the
					client release the delegation onto the reply
					by setting the "recall" flag to true.
				</t>
				<t>
					On success the current filehandle retains its value,
					and the current delegation stateid is set to the new
					value.
				</t>
				<t>
					If the client holds no write delegations in
					the range (offset,length), then the server should treat
					this operation as a no-op and simply return NFS4_OK.
				</t>
				<t>
					If the server is unable to atomically convert the
					existing write delegations into read delegations, then
					the request should fail with the error
					NFS4ERR_LOCK_NOTSUPP.
				</t>
				<t>
					ERRORS
				</t>
				<list style="empty">
					<t>NFS4ERR_ADMIN_REVOKED</t>
					<t>NFS4ERR_BADHANDLE</t>
					<t>NFS4ERR_BAD_RANGE</t>
					<t>NFS4ERR_BAD_STATEID</t>
					<t>NFS4ERR_BADXDR</t>
					<t>NFS4ERR_DELAY</t>
					<t>NFS4ERR_EXPIRED</t>
					<t>NFS4ERR_FHEXPIRED</t>
					<t>NFS4ERR_GRACE</t>
					<t>NFS4ERR_INVAL</t>
					<t>NFS4ERR_ISDIR</t>
					<t>NFS4ERR_LEASE_MOVED</t>
					<t>NFS4ERR_LOCK_NOTSUPP</t>
					<t>NFS4ERR_MOVED</t>
					<t>NFS4ERR_NOFILEHANDLE</t>
					<t>NFS4ERR_NOTSUPP</t>
					<t>NFS4ERR_OLD_STATEID</t>
					<t>NFS4ERR_RESOURCE</t>
					<t>NFS4ERR_SERVERFAULT</t>
					<t>NFS4ERR_STALE</t>
					<t>NFS4ERR_STALE_STATEID</t>
				</list>
			</section>
			<section title="DELEG_RELEASE - release a delegation on a byte range">
				<figure><artwork>
SYNOPSIS

  (cfh), stateid, deleg_seqid, offset, length -> stateid

ARGUMENT

  struct DELEG_RELEASE4args {
          /* CURRENT_FH: file */
          stateid4           stateid;
          seqid4             deleg_seqid;
          offset4            offset;
          length4            length;
  };

RESULT

  struct DELEG_RELEASE4resok {
          stateid4           stateid;
  };

  union DELEG_RELEASE4res switch (nfsstat4 status) {
    case NFS4_OK:
            DELEG_RELEASE4resok resok;
    default:
            void;
  };
				</artwork></figure>
				<t>
					DESCRIPTION
				</t>
				<t>
					The DELEG_RELEASE operation notifies the server that
					the client is no longer caching any data in the specified
					range, and returns any byte range delegations that may
					be held in that range.
				</t>
				<t>
					ERRORS
				</t>
				<list style="empty">
					<t>NFS4ERR_ADMIN_REVOKED</t>
					<t>NFS4ERR_BADHANDLE</t>
					<t>NFS4ERR_BAD_RANGE</t>
					<t>NFS4ERR_BAD_STATEID</t>
					<t>NFS4ERR_BADXDR</t>
					<t>NFS4ERR_DELAY</t>
					<t>NFS4ERR_EXPIRED</t>
					<t>NFS4ERR_FHEXPIRED</t>
					<t>NFS4ERR_INVAL</t>
					<t>NFS4ERR_ISDIR</t>
					<t>NFS4ERR_LEASE_MOVED</t>
					<t>NFS4ERR_MOVED</t>
					<t>NFS4ERR_NOFILEHANDLE</t>
					<t>NFS4ERR_NOTSUPP</t>
					<t>NFS4ERR_OLD_STATEID</t>
					<t>NFS4ERR_RESOURCE</t>
					<t>NFS4ERR_SERVERFAULT</t>
					<t>NFS4ERR_STALE</t>
					<t>NFS4ERR_STALE_STATEID</t>
				</list>
			</section>
			<section title="DELEG_PUT_STATEID - set the current delegation stateid">
				<figure><artwork>
SYNOPSIS

  (cfh), stateid -> (cstateid)

ARGUMENT

  struct DELEG_PUT_STATEID4args {
          /* CURRENT_FH: file */
          stateid4           stateid;
  };

RESULT

  struct DELEG_PUT_STATEID4res {
          nfsstat4           status;
  };
				</artwork></figure>
				<t>
					DESCRIPTION
				</t>
				<t>
					The DELEG_PUT_STATEID operation is used by the
					client to set the current delegation stateid.
				</t>
				<t>
					If the client specifies the special stateid
					consisting of all zeros, then the server is
					expected to clear the current delegation
					stateid.
				</t>
				<t>
					IMPLEMENTATION
				</t>
				<t>
					This operation is used in order to apply a
					byte range delegation to any subsequent
					READ or WRITE requests within the same
					COMPOUND.
				</t>
				<t>
					ERRORS
				</t>
				<list style="empty">
					<t>NFS4ERR_ADMIN_REVOKED</t>
					<t>NFS4ERR_BADHANDLE</t>
					<t>NFS4ERR_BAD_STATEID</t>
					<t>NFS4ERR_BADXDR</t>
					<t>NFS4ERR_DELAY</t>
					<t>NFS4ERR_EXPIRED</t>
					<t>NFS4ERR_FHEXPIRED</t>
					<t>NFS4ERR_ISDIR</t>
					<t>NFS4ERR_LEASE_MOVED</t>
					<t>NFS4ERR_MOVED</t>
					<t>NFS4ERR_NOFILEHANDLE</t>
					<t>NFS4ERR_OLD_STATEID</t>
					<t>NFS4ERR_RESOURCE</t>
					<t>NFS4ERR_SERVERFAULT</t>
					<t>NFS4ERR_STALE_STATEID</t>
				</list>
			</section>
		</section>
		<section title="New callback operations">
			<section title="CB_RECALL_RANGE - recall a byte range delegation">
				<figure><artwork>
SYNOPSIS

  stateid, offset, length, downgrade, truncate, fh -> ()

ARGUMENT

  struct CB_RECALL_RANGE4args {
          stateid4           stateid;
          offset4            offset;
          length4            length;
          bool               downgrade;
          bool               truncate;
          nfs_fh4            fh;
  };

RESULT

  struct CB_RECALL_RANGE4res {
          nfsstat4           status;
  };
				</artwork></figure>
				<t>
					DESCRIPTION
				</t>
				<t>
					The CB_RECALL_RANGE operation is used to compel
					a client to relinquish a delegated byte range and
					return it to the server.
				</t>
				<t>
					IMPLEMENTATION
				</t>
				<t>
					The downgrade flag is used by the server to inform the
					client about the nature of the caching conflict that
					triggered the callback. If set, it indicates that
					it would suffice to resolve the conflict if the client
					were to downgrade all write delegations in the range
					to read delegations.
				</t>
				<t>
					If the downgrade flag is not set, the client MUST
					prepare to release all delegations in the specified
					range.
				</t>
				<t>
					The truncate flag is used to inform the client that
					the byte range being recalled is about to be
					truncated as a result of an incoming SETATTR or OPEN.
					The client may use this information to discard any
					queued writes that may otherwise have had to be
					transferred to disk.
				</t>
				<t>
					If a race causes the client to believe that it
					is not holding any delegations in the range specified
					by the server and there are no outstanding requests
					for this range, then it may signal this to the server
					using the error NFS4ERR_BAD_RANGE.
					This may for instance be the case
					if the server's CB_RECALL_RANGE call raced with a
					DELEG_RELEASE from the client.
				</t>
				<t>
					ERRORS
				</t>
				<list style="empty">
					<t>NFS4ERR_BADHANDLE</t>
					<t>NFS4ERR_BAD_STATEID</t>
					<t>NFS4ERR_BAD_XDR</t>
					<t>NFS4ERR_BAD_RANGE</t>
					<t>NFS4ERR_BAD_RESOURCE</t>
					<t>NFS4ERR_BAD_SERVERFAULT</t>
				</list>
			</section>
		</section>
	</middle>

	<back>
		<references>
			<reference anchor="RFC3530">
				<front>
					<title>
						Network File System (NFS) version 4 Protocol
					</title>
					<author initials="S." surname="Shepler" Fullname="Spencer Shepler">
						<organization>
							Sun Microsystems, Inc.
						</organization>
					</author>
				</front>
				<seriesInfo name="RFC" value="3530" />
			</reference>
			<reference anchor="RFC2119">
				<front>
					<title>
						Key words for use in RFCs to Indicate Requirement Levels
					</title>
					<author initials="S." surname="Bradner" fullname="S. Bradner">
						<organization abbrev="Harvard University">
							Harvard University
						</organization>
					</author>
				</front>
				<seriesInfo name="RFC" value="2119" />
			</reference>
			<reference anchor="draft-ietf-nfsv4-sess-01">
				<front>
					<title>
						NFSv4 Session Extensions
					</title>
					<author initials="T." surname="Talpey" Fullname="Tom Talpey">
						<organization>
							Network Appliance, Inc.
						</organization>
					</author>
					<author initials="J." surname="Bauman" Fullname="Jon Bauman">
						<organization>
							University of Michigan
						</organization>
					</author>
				</front>
			</reference>
		</references>
	</back>
</rfc>
