在一个容器中,删除一个目录,失败:
bash-4.2# pwd/home/zxcdn/ottcache/tomcatbash-4.2# uname -aLinux 3516b6c97679 3.10.0-327.22.2.el7.x86_64 #1 SMP Fri Sep 29 15:13:08 CST 2017 x86_64 x86_64 x86_64 GNU/Linuxbash-4.2# whoamirootbash-4.2# ls -alrt bintotal 8drwxr-xr-x. 1 root root 4096 Dec 3 02:49 .drwxr-xr-x. 1 root root 4096 Dec 4 02:28 ..bash-4.2# rm -rf binbash-4.2# ls -i33012 binbash-4.2# rm -rf binbash-4.2# ls -i33012 bin
相关docker版本信息:
[root@host-80-80-34-255 caq]# docker infoContainers: 2 Running: 1 Paused: 0 Stopped: 1Images: 1Server Version: 1.13.1Storage Driver: overlay2----------存储引擎 Backing Filesystem: extfs--------底层文件系统 Supports d_type: true Native Overlay Diff: falseLogging Driver: journaldCgroup Driver: systemdPlugins: Volume: local Network: bridge host macvlan null overlaySwarm: inactiveRuntimes: docker-runc runcDefault Runtime: docker-runcInit Binary: /usr/libexec/docker/docker-init-currentcontainerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)runc version: 5eda6f6fd0c2884c2c8e78a6e7119e8d0ecedb77 (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)init version: fec3683b971d9c3ef73f284f176672c44b448662 (expected: 949e6facb77383876aeff8a6944dde66b3089574)Security Options: seccomp WARNING: You're not using the default seccomp profile Profile: /etc/docker/seccomp.jsonKernel Version: 3.10.0-327.22.2.el7.x86_64Operating System: Carrier Grade Server Linux 5OSType: linuxArchitecture: x86_64Number of Docker Hooks: 3CPUs: 2Total Memory: 3.703 GiBName: host-80-80-34-255ID: 4CV6:Y3Q4:NYGV:PABH:VG42:3CN7:CKET:SEIV:4SYF:63PI:HYAB:AZR2Docker Root Dir: /var/lib/dockerDebug Mode (client): falseDebug Mode (server): falseRegistry: https://index.docker.io/v1/WARNING: bridge-nf-call-iptables is disabledWARNING: bridge-nf-call-ip6tables is disabledExperimental: falseInsecure Registries: 0.0.0.0/0 127.0.0.0/8Live Restore Enabled: falseRegistries: docker.io (secure)
发现删除不了这个空目录,strace跟踪一下,报错如下:
fcntl(3, F_GETFL) = 0x38800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW)fcntl(3, F_SETFD, FD_CLOEXEC) = 0getdents(3, /* 2 entries */, 32768) = 48getdents(3, /* 0 entries */, 32768) = 0close(3) = 0unlinkat(AT_FDCWD, "bin", AT_REMOVEDIR) = -1 EINVAL (Invalid argument)lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
原来是unlinkat报错,然后内核打点跟踪,堆栈如下:
Returning from: 0xffffffff811ed500 : vfs_rename+0x0/0x790 [kernel]Returning to : 0xffffffffa039860b : ovl_do_rename+0x3b/0xa0 [overlay] 0xffffffffa0398e4e : ovl_clear_empty+0x27e/0x2e0 [overlay] 0xffffffffa0398f28 : ovl_check_empty_and_clear+0x78/0x90 [overlay] 0xffffffffa039999c : ovl_do_remove+0x1ec/0x470 [overlay] 0xffffffffa0399c36 : ovl_rmdir+0x16/0x20 [overlay] 0xffffffff811ec738 : vfs_rmdir+0xa8/0x100 [kernel] 0xffffffff811f16d5 : do_rmdir+0x1a5/0x200 [kernel] 0xffffffff811f28b5 : SyS_unlinkat+0x25/0x40 [kernel] 0xffffffff81649909 : system_call_fastpath+0x16/0x1b [kernel]
看下确定是vfs_name出错了,具体按行号打点:
probe kernel.statement("vfs_rename@namei.c:4122"){ p_my=@cast($old_dir,"struct inode")->i_op; iflags=@cast($old_dir,"struct inode")->i_flags; printf("line 4122 flags=%u,rename2=%x,iflags=%u\r\n",$flags,@cast(p_my,"struct inode_operations_wrapper")->rename2,iflags); print_backtrace();}
对应的内核源码:
int vfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry, struct inode **delegated_inode, unsigned int flags){。。。。 rename2 = get_rename2_iop(old_dir);---------------4118行 if (!old_dir->i_op->rename && !rename2) return -EPERM; if (flags && !rename2)----------------------------4122行 return -EINVAL;。。。。}
一开始我直接取的rename2,发现不为NULL,按道理进不去4122行,后来经细心的谈虎走查,才发现是进入了如下的判断条件:
static inline const struct inode_operations_wrapper *get_iop_wrapper(struct inode *inode, unsigned version){ const struct inode_operations_wrapper *wrapper; if (!IS_IOPS_WRAPPER(inode))------------最终是这个条件起作用了 return NULL; wrapper = container_of(inode->i_op, const struct inode_operations_wrapper, ops); if (wrapper->version < version) return NULL; return wrapper;}static inline iop_rename2_t get_rename2_iop(struct inode *inode){ const struct inode_operations_wrapper *wrapper = get_iop_wrapper(inode, 0); return wrapper ? wrapper->rename2 : NULL;}
看起来,该内核版本的overlay存储引擎,对ext4的底层文件系统,兼容性存在一些问题。后来使用device-mapper来解决了该问题。