Skip to content

canonical-kubernetes localhost: kubelet cannot check disk space #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adrien-f opened this issue Feb 18, 2017 · 11 comments
Open

canonical-kubernetes localhost: kubelet cannot check disk space #48

adrien-f opened this issue Feb 18, 2017 · 11 comments
Assignees

Comments

@adrien-f
Copy link

adrien-f commented Feb 18, 2017

Greetings,

Conjured Canonical Kubernetes on localhost Ubuntu 16.04.2 with default settings.

I'm here again with another issue, on the works I've noticed that kubelet cannot check disk space, complaining about the zfs binary not found. This is not really critical but that means that heapster is not recording nodes/pods stats.

After installing zfsutils-linux manually on the workers, here's the errors I'm getting:

Feb 18 15:38:00 juju-f96834-9 kubelet[1505]: E0218 15:38:00.530263    1505 kubelet.go:1634] Failed to check if disk space is available for the runtime: failed to get fs info for "runtime": failed to find information for the filesystem labeled "docker-images"
Feb 18 15:38:00 juju-f96834-9 kubelet[1505]: E0218 15:38:00.530288    1505 kubelet.go:1642] Failed to check if disk space is available on the root partition: failed to get fs info for "root": did not find fs info for dir: /var/lib/kubelet
Feb 18 15:38:05 juju-f96834-9 kubelet[1505]: E0218 15:38:05.043160    1505 handler.go:246] HTTP InternalServerError serving /stats/summary: Internal Error: failed RootFsInfo: did not find fs info for dir: /var/lib/kubelet
Feb 18 15:38:08 juju-f96834-9 kubelet[1505]: E0218 15:38:08.915959    1505 fs.go:333] Stat fs failed. Error: exit status 1: "/sbin/zfs zfs get -Hp all lxd/containers/juju-f96834-9" => /dev/zfs and /proc/self/mounts are required.
Feb 18 15:38:08 juju-f96834-9 kubelet[1505]: Try running 'udevadm trigger' and 'mount -t proc proc /proc' as root.

I'm noticing /dev/zfs is not existing on the workers, so I tried adding it:

lxc config device add juju-f96834-9 /dev/zfs unix-block path=/dev/zfs

But back in the container, with strace:

root@juju-f96834-9:~# strace zfs get -Hp all lxd/containers/juju-f96834-9
[snip]
access("/sys/module/zfs", F_OK)         = 0
access("/sys/module/zfs", F_OK)         = 0
open("/dev/zfs", O_RDWR)                = -1 ENXIO (No such device or address)
write(2, "The ZFS modules are not loaded.\n"..., 87The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
) = 87
exit_group(1)                           = ?
+++ exited with 1 +++

I guess it may have something to do with unprivileged containers and the host zfs ? Let me know if you need more informations, thanks again !

@adam-stokes
Copy link

So we do modify the lxd profile to allow certain kernel modules in to the containers. Do we just need the zfs modules loaded? I can easily add those in there

@adrien-f
Copy link
Author

Can I do anything to help you debug this issue ?

@adam-stokes
Copy link

You can try updating the profile:

lxc profile edit juju-f96834-9

And adding the necessary kernel modules to linux.kernel_modules: if you need access to /proc as well in that same profile make sure raw.lxc looks like:

  raw.lxc: |
    lxc.aa_profile=unconfined
    lxc.mount.auto=proc:rw sys:rw

Let me know how that goes and if it works I can update the profile accordingly

@adam-stokes
Copy link

Heres a list of default modules loaded when installing zfsutils-linux:

zfs                  2813952  3
zunicode              331776  1 zfs
zcommon                57344  1 zfs
znvpair                90112  2 zfs,zcommon
spl                   102400  3 zfs,zcommon,znvpair
zavl                   16384  1 zfs

@adrien-f
Copy link
Author

It seems those profile instructions were already there:

$ lxc profile edit juju-conjure-up-canonical-kubernetes-0e8
config:
  boot.autostart: "true"
  linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay
  raw.lxc: |
    lxc.aa_profile=unconfined
    lxc.mount.auto=proc:rw sys:rw
  security.nesting: "true"
  security.privileged: "true"
description: ""
devices:
  aadisable:
    path: /sys/module/nf_conntrack/parameters/hashsize
    source: /dev/null
    type: disk
  aadisable1:
    path: /sys/module/apparmor/parameters/enabled
    source: /dev/null
    type: disk
  root:
    path: /
    pool: lxd
    type: disk
name: juju-conjure-up-canonical-kubernetes-0e8
used_by:
- /1.0/containers/juju-f96834-0
- /1.0/containers/juju-f96834-1
- /1.0/containers/juju-f96834-2
- /1.0/containers/juju-f96834-3
- /1.0/containers/juju-f96834-4
- /1.0/containers/juju-f96834-5
- /1.0/containers/juju-f96834-7
- /1.0/containers/juju-f96834-8
- /1.0/containers/juju-f96834-9

@adam-stokes
Copy link

Not the zfs modules listed though, can you add those?

@adrien-f
Copy link
Author

adrien-f commented Feb 27, 2017

Oh, my bad. Here it is:

config:
  boot.autostart: "true"
  linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay,zfs,zunicode,zcommon,znvpair,spl,zavl
  raw.lxc: |
    lxc.aa_profile=unconfined
    lxc.mount.auto=proc:rw sys:rw
  security.nesting: "true"
  security.privileged: "true"
description: ""
devices:
  aadisable:
    path: /sys/module/nf_conntrack/parameters/hashsize
    source: /dev/null
    type: disk
  aadisable1:
    path: /sys/module/apparmor/parameters/enabled
    source: /dev/null
    type: disk
  root:
    path: /
    pool: lxd
    type: disk
name: juju-conjure-up-canonical-kubernetes-0e8
used_by:
- /1.0/containers/juju-f96834-0
- /1.0/containers/juju-f96834-1
- /1.0/containers/juju-f96834-2
- /1.0/containers/juju-f96834-3
- /1.0/containers/juju-f96834-4
- /1.0/containers/juju-f96834-5
- /1.0/containers/juju-f96834-7
- /1.0/containers/juju-f96834-8
- /1.0/containers/juju-f96834-9

I've removed the /dev/zfs device I manually added and rebooted, here's syslog:

Feb 27 15:15:13 juju-f96834-9 systemd[1]: Started udev Wait for Complete Device Initialization.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/zfs-import-scan.service: Operation not permitted
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Starting Import ZFS pools by device scanning...
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/systemd-udev-settle.service: Operation not permitted
Feb 27 15:15:13 juju-f96834-9 cloud-init[56]: Cloud-init v. 0.7.8 running 'init-local' at Mon, 27 Feb 2017 15:14:53 +0000. Up 7.0 seconds.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Started Initial cloud-init job (pre-networking).
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Reached target Network (Pre).
Feb 27 15:15:13 juju-f96834-9 zpool[410]: /dev/zfs and /proc/self/mounts are required.
Feb 27 15:15:13 juju-f96834-9 zpool[410]: Try running 'udevadm trigger' and 'mount -t proc proc /proc' as root.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-import-scan.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to start Import ZFS pools by device scanning.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-import-scan.service: Unit entered failed state.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-import-scan.service: Failed with result 'exit-code'.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/zfs-mount.service: Operation not permitted
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Starting Mount ZFS filesystems...
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/cloud-init-local.service: Operation not permitted
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Time has been changed
Feb 27 15:15:13 juju-f96834-9 zfs[419]: /dev/zfs and /proc/self/mounts are required.
Feb 27 15:15:13 juju-f96834-9 zfs[419]: Try running 'udevadm trigger' and 'mount -t proc proc /proc' as root.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-mount.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to start Mount ZFS filesystems.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Dependency failed for ZFS startup target.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs.target: Job zfs.target/start failed with result 'dependency'.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-mount.service: Unit entered failed state.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-mount.service: Failed with result 'exit-code'.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Reached target Local File Systems.

If I mount the host /dev/zfs (I'm not sure it's even a good idea ?):

Feb 27 15:20:31 juju-f96834-9 systemd[1]: Started udev Wait for Complete Device Initialization.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/zfs-import-scan.service: Operation not permitted
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Starting Import ZFS pools by device scanning...
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/cloud-init-local.service: Operation not permitted
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/systemd-udev-settle.service: Operation not permitted
Feb 27 15:20:31 juju-f96834-9 zpool[403]: The ZFS modules are not loaded.
Feb 27 15:20:31 juju-f96834-9 zpool[403]: Try running '/sbin/modprobe zfs' as root to load them.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-import-scan.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to start Import ZFS pools by device scanning.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-import-scan.service: Unit entered failed state.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-import-scan.service: Failed with result 'exit-code'.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/zfs-mount.service: Operation not permitted
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Starting Mount ZFS filesystems...
Feb 27 15:20:31 juju-f96834-9 zfs[404]: The ZFS modules are not loaded.
Feb 27 15:20:31 juju-f96834-9 zfs[404]: Try running '/sbin/modprobe zfs' as root to load them.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-mount.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to start Mount ZFS filesystems.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Dependency failed for ZFS startup target.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs.target: Job zfs.target/start failed with result 'dependency'.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-mount.service: Unit entered failed state.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-mount.service: Failed with result 'exit-code'.

I'm still getting the same errors from kubelet, let me know if it helps !

@adam-stokes
Copy link

@stgraber do you know what it takes to get zfs loaded inside the container?

@stgraber
Copy link

ZFS doesn't support any kind of namespacing, so you absolutely DO NOT want it to work from inside a container.

If /dev/zfs is available with write access inside the container and you tweak things so that the tools work, what you'll see is the HOST view of ZFS. All the mountpoints listed will be the host mount points and any volume creation/removal will affect the host, not the container.

I think a better question here is why does kubelet need the zfs commands to check disk space?

@adrien-f
Copy link
Author

adrien-f commented Feb 27, 2017

I've been looking into it and arrived at the google/cadvisor project that kubelet use to gather stats:

https://github.com/google/cadvisor/blob/ba33b5a25bfd1a4e627093ef080872cad627e028/fs/fs.go#L322

I will go and raise an issue with them. In the meantime I guess I could use lxd with another storage backend. Thanks you very much for your help 👍

@adam-stokes
Copy link

@adrien-f Thanks for the report, let us know if we can be of further help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants