Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to support nvidia legacy driver #16

Open
jabl opened this issue Jun 4, 2018 · 3 comments
Open

Need to support nvidia legacy driver #16

jabl opened this issue Jun 4, 2018 · 3 comments

Comments

@jabl
Copy link
Contributor

jabl commented Jun 4, 2018

On some of our (FGI-era) GPU nodes dmesg says:

[15280.304871] NVRM: The NVIDIA Tesla M2070 GPU installed in this system is
NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please
NVRM:  visit http://www.nvidia.com/object/unix.html for more
NVRM:  information.  The 396.26 NVIDIA driver will ignore
NVRM:  this GPU.  Continuing probe...

We need to figure out how to support these nodes, perhaps fixing an older version of nvidia-kmod is enough?

@VilleS1
Copy link
Contributor

VilleS1 commented Jun 4, 2018

elrepo has previously released for example nvidia 340xx driver which when installed will stay in compatible version for rest of systems life. For some reason 390xx driver has not been released.
http://elrepo.org/linux/elrepo/el7/x86_64/RPMS/

@VilleS1
Copy link
Contributor

VilleS1 commented Sep 6, 2018

The elrepo 390xx is in elrepo-testing but I think it is not compatible here.

One possibility is to erase nvidia stuff:
yum erase cuda-drivers xorg-x11-drv-nvidia xorg-x11-drv-nvidia-devel xorg-x11-drv-nvidia-gl xorg-x11-drv-nvidia-libs cuda nvidia-kmod

Then install:
yum install cuda-drivers-390.30-1.x86_64 xorg-x11-drv-nvidia-390.30-1.el7.x86_64 xorg-x11-drv-nvidia-devel-390.30-1.el7.x86_64 xorg-x11-drv-nvidia-gl-390.30-1.el7.x86_64 xorg-x11-drv-nvidia-libs-390.30-1.el7.x86_64 cuda-9.1.85-1.x86_64 cuda-9-1-9.1.85-1.x86_64 cuda-demo-suite-9-1-9.1.85-1.x86_64 cuda-runtime-9-1-9.1.85-1.x86_64 nvidia-kmod-390.30

Then install versionlock plugin:
yum install yum-plugin-versionlock

And lock it:
yum versionlock cuda-drivers xorg-x11-drv-nvidia xorg-x11-drv-nvidia-devel xorg-x11-drv-nvidia-gl xorg-x11-drv-nvidia-libs cuda nvidia-kmod

@jabl
Copy link
Contributor Author

jabl commented Sep 6, 2018

Yeah, in the end what we did was to put in the group_vars for the affected nodes

kickstart_extra_post_commands: |
  ...
  # for older systems with NVIDIA card fix the cuda version to 9.1 
  yum -y install yum-plugin-versionlock libibverbs
  echo "1:nvidia-kmod-390.30-2.el7.*" >> /etc/yum/pluginconf.d/versionlock.list
  echo "1:xorg-x11-drv-nvidia-390.30-1.el7.*" >> /etc/yum/pluginconf.d/versionlock.list
  echo "1:xorg-x11-drv-nvidia-libs-390.30-1.el7.*" >> /etc/yum/pluginconf.d/versionlock.list
  echo "1:xorg-x11-drv-nvidia-devel-390.30-1.el7.*" >> /etc/yum/pluginconf.d/versionlock.list
  echo "1:xorg-x11-drv-nvidia-gl-390.30-1.el7.*" >> /etc/yum/pluginconf.d/versionlock.list
  echo "0:cuda-drivers-390.30-1.*" >> /etc/yum/pluginconf.d/versionlock.list
  echo "0:cuda-9.1.85-1.*" >> /etc/yum/pluginconf.d/versionlock.list
  # install kmod so no extra reboot needed later as /dev/nvidia0 is found
  if lspci|egrep -q '(M2090|M2070)'; then rpm -ivh http://10.10.254.20/nvidia-kmod-390.30-2.el7.x86_64.rpm; fi

Kludgy maybe, but got the job done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants