PNFS File-based Stateid Distribution Design
From Linux NFS
Contents |
Updated Stateid Distribution Information
This page discusses a new export op on the MDS, change_state, that will revoke stateids cached on the DS. For reference, some background info is available at the bottom.
Issue 1: Do we need it?
We need some mechanism to ensure correctness of the share reservations and stateids on the DS. E.g., if a share reservation is upgraded, then the access/deny bits on the DS need to be updated to ensure that I/O is rejected properly.
Issue 2: What should it do?
We need to ensure that both stateids and share reservations are updated properly. Ideally, it would update the stateid on each DS with the latest info, but synchronous state updates on every change from the MDS to DS would be prohibitively slow. So I suggest that it simply deletes state on the DSs, allowing each to DS to retrieve the updated state info at its leasure. (In the future we could asynchronously update/add stateids.)
Issue 3: When do we call it?
Updating the DSs every time the stateid changes, would be prohibitively slow. In addition, it is often unecessary since a stateid could change 3 times before a single I/O is sent to the DS (e.g., open->deleg->lock). The spec says that the DS must allow/reject I/O which the MDS would allow/reject. This means that on OPEN/LOCK/ULOCK, we can continue to let the DS retrieve state laissez-faire. But on CLOSE, OPEN_DOWNGRADE and a 2nd OPEN that upgrades the share reservation, the MDS must call update_stateid to synchronously revoke the state.
Issue 4: Arguments
Some suggestions are: - stateid (boot, gen, file, owner) - Share reservation (access/deny bits) - Revoke/Update flag - inode (may be necessary since deleg id doesn't have a fileid)
Issue 5: Cleaning up old stateids on DS
Since several stateids may be cached for a single file, e.g., open->lock->open, *old* stateids currently stick around until nfsd exits. In order to be more proactive and memory conscious, we can use the laundromat thread along with an lru list and limit the number of stateids.
So in summary, we would add code so that the MDS would revoke stateids on DSs at close/open_{down/up} to ensure share reservations are properly checked. In addition, we would have the laundromat thread clean up old stateids on the DSs.
Background Information
Current Code
a) Laissez-faire retrieval of stateids on DS from the MDS
- Export functions: DS: get_state MDS: cb_get_state - If a stateid changes, e.g., open->deleg, the new stateid is
retrieved with the old one is left alone. Old stateids are not cleaned up by cb_change_state and are only cleaned up when NFSD is exited. b) Ability for file system to update/revoke the state on the DS (right now, we only revoke)
- Export function: DS: cb_change_state
Issues
a) New export op on MDS, change_state, that the MDS can use to revoke/update the stateid on the DS at certain points. We definitely don't want to do this every time the stateid changes, since this would be way too often and unecessary since stateids could change 3 times before a single I/O is sent to the DS (e.g., open->deleg->lock).
b) Once a stateid changes from one type to another, there is no current way in the linux impl. to link the old and new stateids together. How do we identify all the stateids in the change_state export op. Do we need to pass an array to change_state, or do we need multiple calls, or is there a simpler identifer?
c) In section 13.9.1., it says, - "The stateid sent to the data server MUST be sent with the seqid set to zero, indicating the most current version of that stateid, rather than indicating a specific non-zero seqid value."
In the linux code, the seqid maps to the si_generation field of the stateid. We need to ensure that the DS never compares the si_generation field.
For background info, currently, the MDS bumps the seqid field (si_generation) via update_stateid in the following cases: - open upgrade/downgrade - open confirm - close (just for return args, but it is then released and we would need to call the change_state export op) - lock -unlock
d) At all costs, I would like to avoid synchronous pushes of stateids to the DSs every time the stateid changes. One option would be to asyncronously push the stateid from the MDS to the DS, and simply resolve the timing issue when the DS is asking the MDS. Under the banner of KISS, let's avoid this until we determine that laissez faire stateid retrieval is causing problems.
Dean Hildebrand