Server Side Copy
From Linux NFS
The server-side copy feature provides a mechanism for the NFS client to perform a file copy on the server without the data being transmitted back and forth over the network. Without this feature, an NFS client copies data from one location to another by reading the data from the server over the network, and then writing the data back over the network to the server. Using this server-side copy operation, the client is able to instruct the server to copy the data locally without the data being sent back and forth over the network unnecessarily.
The main usecase is for virtual machine migration between servers operating over NFS. Another use is for copying large files from one directory on a server to a different directory on the same server.
The application calling the copyfile() system call is in charge of opening and closing file descriptors before and after the copy call. If a file can't be opened, it can't be copied.
Contents |
Data type reference
typedef uint64_t length4 typedef uint64_t offset4 const COPY4_GUARDED = 0x00000001; const COPY4_METADATA = 0x00000002; struct write_response4 { stateid4 wr_callback_id<1>; count4 wr_count; stable_how4 wr_committed; verifier4 wr_writeverf; };
Argument
struct COPY4args { /* SAVED_FH: source file */ /* CURRENT_FH: destination file or */ /* directory */ stateid4 ca_src_stateid; stateid4 ca_dst_stateid; offset4 ca_src_offset; offset4 ca_dst_offset; length4 ca_count; uint32_t ca_flags; component4 ca_destination; netloc4 ca_source_server<>; };
Result
union COPY4res switch (nfsstat4 cr_status) { case NFS4_OK: write_response4 resok4; default: length4 cr_bytes_copied; };
Copy range system call
ssize_t vfs_copy_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t count); struct file_operations { <snip> ssize_t (*copy_range)(struct file *, loff_t, struct file *, loff_t, size_t); };
My modifications
Rather than returning with an error if the filesystem doesn't support the copy_range file operation, instead fall back on do_splice_direct() to copy the data.
Also do the fallback if the filesystem returns -ENOTSUPP to the VFS so NFS v4 and v4.1 can use the copy operation too without having to include a "../internal.h" an export a vfs function to modules.
Synchronous copy
Synchronous copy is significantly easier to implement, so it is a good milestone for implementing the entire copy operation. Later patches can expand on the sync code to make the copy asynchronous.
Client
- Enable the copy_range operation for v4.2, return -ENOTSUPP for v4 and v4.1.
- Prefer a lock stateid if the file has one, otherwise use the open id.
- Send the compound:
SEQUENCE PUTFH /* source */ SAVEFH PUTFH /* destination */ COPY
- Don't need to worry about the server to server case, this may be removed from the RFC and the vfs_copy_range() function gives an error if the file moves to a different superblock.
- The server will tell us the number of bytes copied, return this to the vfs_copy_range() function.
Server
- OP_COPY op_flags should mimic the flags set in OP_WRITE.
- Use nfs_preprocess_stateid_op() to find files associated with both the CURRENT_FH and the SAVED_FH.
- Call the vfs_copy_range() function with arguments provided by the client.
- Only copy the first 1 GB (1073741842 bytes) of the requested range to avoid holding an RPC slot for too long.
- Call vfs_fsync_range() after the data is copied and set stable_how to NFS_FILE_SYNC.
- Return an empty stateid list to the client to show that the copy was done synchronously.
Asynchronous copy
Asynchronous copy will free up RPC slots, since the server can do the copy on its own time and simply notify the client once it's done. To be spec compliant, the OFFLOAD_STATUS and OFFLOAD_ABORT operations also need to be implemented, but since I mostly want to prepare the client for this case the server patch may be submitted at a later time.
Client
- Keep a list of offloads that we know are in flight. Use a spinlock to protect list access.
- Use a struct completion to put the thread to sleep until the callback comes in if we detect an async copy.
- Watch for the OP_CB_OFFLOAD callback from the server.
- Match the callback stateid to a stateid on the offload waitlist.
- Call complete() on the completion struct.
- Return bytes_copied from the callback data and not the COPY reply data.
Server
- Remove the 1GB copy cap.
- Schedule the copy to run later using a work struct.
- Need to allocate a new structure to pass to the delayed function so the main thread can be deallocated.
- Need to initialize a new stateid to represent the copy and return it to the client so it knows to expect the callback.
- Call CB_OFFLOAD after completing the copy.
- Free the stateid during nfsd4_cb_offload_release()
Userspace test program
My test program is similar to cp. Run it as `nfscopy.py file1 file2` to make a copy of file1 named file2. If an entire file cannot be copied in one call then copy will be called again with the range set to the remaining data.
#!/usr/bin/python import sys import os from ctypes import * libc = CDLL("libc.so.6") SYS_COPY_RANGE = 314 def copyfile(f_in, f_out): INTP = POINTER(c_int) size = os.stat(f_in.fileno()).st_size copied = 0 while size != 0: pos = c_int(copied) addr = addressof(pos) offset = cast(addr, INTP) print("Offset: %s size: %s" % (copied, size)) ret = libc.syscall(c_int(SYS_COPY_RANGE), c_int(f_in.fileno()), offset, c_int(f_out.fileno()), offset, c_int(size)) # count copied = copied + ret; size = size - ret; print("SYS_COPY_RANGE returns: %s" % ret) print("Total: %s" % copied) if ret < 0: return ret if ret == 0: return ret return ret if len(sys.argv) != 3: print("Usage: " + sys.argv[0] + " src dst") sys.exit(1) if not os.path.exists(sys.argv[1]): print("ERROR: " + sys.argv[1] + " does not exist :(") sys.exit(1) f_in = open(sys.argv[1], "r") f_out = open(sys.argv[2], "w") copyfile(f_in, f_out)