Centos7.6安装GPU

文章目录

一、安装基础环境
- 1.安装GCC
- 1.2 安装kernel
二、查看显卡信息
三、安装显卡驱动
4 安装cuda
- 4.1 下载cuda
- 4.2 安装cuda
5 安装 cuDNN
- 5.1 下载cuDNN
- 5.2 安装cuDNN

一、安装基础环境

1.安装GCC

# 更新软件包
yum update

# 安装gcc
yum -y install gcc gcc-c++

# 更新完成后，最好重新启动

1.2 安装kernel

# 查看内核版本
uname -r

# 返回值
3.10.0-957.el7.x86_64

# 安装内核和相关包
yum install kernel-devel kernel-headers -y

二、查看显卡信息

lspci | grep -i nvidia

# 返回信息
17:00.0 VGA compatible controller: NVIDIA Corporation Device 2216 (rev a1)
17:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)

三、安装显卡驱动

3.1 下载显卡驱动

注意：显卡驱动和CUDA有对应关系，建议不要下载太高版本。

我选择的版本是“NVIDIA-Linux-x86_64-470.74.run”

# 根据自己显卡的型号选择驱动
https://www.nvidia.cn/geforce/drivers/

上面的步骤一般查到的版本都比较新，使用下面的可以下载别的版本

# 所有驱动的下载地址
https://download.nvidia.com/XFree86/Linux-x86_64/

3.2 安装显卡驱动

注意：如果安装中出现较多问题，建议更换系统或软件版本。

安装过程安装系统默认即可。

# 一般情况下可以安装成功
sh NVIDIA-Linux-x86_64-470.74.run

# 指定内核版本号和路径
sh NVIDIA-Linux-x86_64-470.74.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.66.1.el7.x86_64 -k $(uname -r)

（1）内核报错

ERROR: Unable to find the kernel source tree for the currently running kernel.  Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed.  If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option.

解决方法，添加 –kernel-source-path指定内核路径。

sh NVIDIA-Linux-x86_64-470.74.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.66.1.el7.x86_64

（2）不能加载nvdia.ko报错

Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA GPU(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

解决方法（一般情况下可以解决）

sh NVIDIA-Linux-x86_64-470.74.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.66.1.el7.x86_64 -k $(uname -r)

（3）不能加载nvidia-drm.ko报错

ERROR: Unable to load the kernel module 'nvidia-drm.ko'

解决方法，更新软件包并重新启动

# 更新软件包
yum update

上述方法不能解决，请添加–no-drm参数

# 建议指定内核版本号和路径
sh NVIDIA-Linux-x86_64-470.74.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.66.1.el7.x86_64 -k $(uname -r) --no-drm

3.3 查看安装的版本

nvidia-smi

4 安装cuda

注意：cuda和显卡驱动有对应关系

4.1 下载cuda

我选择的版本是“cuda_11.5.0_495.29.05_linux.run”

# cuda官网地址
https://developer.nvidia.com/cuda-toolkit-archive

4.2 安装cuda

下载完成后，直接安装cuda即可。注意：安装过程没有进度提示

sh cuda_11.5.0_495.29.05_linux.run

（1）接受许可

（2）安装cuda

（3）添加环境变量
安装成功后的返回结果

===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-11.5/
Samples:  Installed in /root/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-11.5/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-11.5/lib64, or, add /usr/local/cuda-11.5/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.5/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log

添加环境变量

# 编辑环境变量文件
vim /etc/profile

# 在最后一行添加环境变量
export PATH=$PATH:/usr/local/cuda-11.5/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.5/lib64
export CUDA_HOME=/usr/local/cuda-11.5/

# 使环境变量生效
source /etc/profile

（4）查看cuda版本

# 查看版本
nvcc -V

# 返回值
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Sep_13_19:13:29_PDT_2021
Cuda compilation tools, release 11.5, V11.5.50
Build cuda_11.5.r11.5/compiler.30411180_0

5 安装 cuDNN

5.1 下载cuDNN

注意：需要对应cuda的版本号

我选择cuDNN的版本号“cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive.tar.xz”

不用用注册登录下载cuDNN，下载地址如下：

https://developer.nvidia.com/rdp/cudnn-archive#a-collapse51b

5.2 安装cuDNN

（1）解压文件

# 使用xz解压为tar文件
xz -d cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive.tar.xz

# 使用tar解压
tar -xf cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive.tar

# 解压后的目录是
cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive

# cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive目录下的文件
include  lib  LICENSE

（2）复制文件

# 进入目录
cd cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive

# 复制文件，在复制过程中保留源文件的所有属性
sudo cp -P include/cudnn*.h /usr/local/cuda/include 
sudo cp -P lib/libcudnn* /usr/local/cuda/lib64

# 为所有用户添加可读权限
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

文章由极客之音整理，本文链接：https://www.bmabk.com/index.php/post/199459.html