Monday, August 27, 2012

How NFS4 became stateful

Network File System since version 4 is a stateful protocol. In order not to introduce any regression in comparison to the earlier, stateless, versions a numerous methods to recover from either client or server failure had to be adopted.

During client operation the server may store up to three different types of states. Initialization of all of them involves client sending an opaque value called owner which should be generated using the rules depending on the state type. Usually there are also sequence numbers used in order to ensure at-most-once semantics what was described in more details in my previous post. If state creation is successful the server returns a stateid value (or clientid in case of client state) which acts as a shorthand reference for the, usually long, original owner value.
  • client state - "root" state, obtaining it is necassary in order to create or reclaim any other state on the server. It is used to identify a client instance. The owner value has to remained unchanged after client reboot. More throughout description of client state usage in order to recover after client reboot is described the following section of this post.
  • open state - obtained when a client opens a file. The same open owner may be used simultanously for many opened files. The NFS4 specification compares an open owner to a file descriptor that may be shared among multiple processes.
  • lock state - obtained when a client creates a lock. The same lock owner may be able to upgrade its own locks, if the server supports such operation. To achieve POSIX-like behavior lock owner should be generated using process identifier.

Client crash recovery

Recovery from a client crash is usually quite straightforward. Clients are obliged to periodically renew all leases (i.e. states) they hold on the server. It can be accomplished by issuing either any request that contains stateid value or special RENEW request that automatically renews all leases held by the client.

The server decides how often leases are to be renewed. It is a matter of choosing between network traffic and slower recovery from either client or server failure. Apparently, 90 seconds is quite common default value.

When a client reboots and reconnects it sends a SETCLIENTID request using the same owner value as its previous instance and new verifier. Such request informs the server that the client has rebooted and all leases it held can be released immediately.

Server crash recovery

When server reboots the client will be notified not later than after lease time. Either a request containing stateid value or a RENEW operation will return error code indicating that server had rebooted and clients need to reclaim their leases.

In order to prevent conflicts between clients reclaiming old leases and clients trying to acquire new, the server after reboot enters so called grace period in which no new leases can be acquired. Grace period is no shorter than lease validity time so that all clients will attempt to renew their leases at least once during this period and evantually will be notified about the reboot. Moreover, only during grace period old leases may be reclaimed, what allows to avoid possible race conditions.

Crash recovery and open delegations

Lease reclaimation also happens when a client reboots while holding an open delegation. In such case issuing a SETCLIENTID does not release state tied with open delegations, since there may be cached writes that need to be synchronized with the server. Then, the client reclaims its previously held delegation just as it does with any other lease after server reboot.

Open delegations also introduce another problem. Since, they rely on RPC callbacks it is possible that callback path breaks. In such case, the server waits for a RENEW operation and responds with error NFS4ERR_CB_PATH_DOWN. Such error code means for the client that although all leases were successfully renewed the callback path is broken and all delegations have to be returned as soon as possible.

Useful links

  • RFC 3530 - Network File System Protocol version 4