摘要:Ubuntu18.04、GCC降低、CUDA版本的选择、cuDNN的选择
前言:在经历一整天反复安装卸载后决定对每一步进行详细记录和分析

对版本选择的考虑

我的目的是在Ubuntu18.04安装GPU版本的Tensorflow进行学习,根据TF官网对CUDA&cuDNN的支持信息:

软件要求

必须在您的系统上安装以下NVIDIA®软件:

于是在考虑最新的TF版本情况下,我决定安装CUDA9.0+cuDNN7.3版本

cuda:https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1704&target_type=runfilelocal

cudnn:https://developer.nvidia.com/rdp/cudnn-download(需要先注册账户)

然而尴尬的是CUDA9.0没有ubuntu18.04的选择,不想重装ubuntu,于是我决定下载17.04版本试试看

最终下载版本:cuda_9.0.176_384.81_linux.run & cudnn-9.0-linux-x64-v7.3.1.20.tgz

安装CUDA

1.安装依赖关系

参考:https://www.cnblogs.com/Ph-one/p/9000211.html

1
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

根据诸多博客的经验,还需要先降低GCC版本,以下是我的gcc版本信息:

1
2
root@zhou-pc:~# gcc --version
gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0

降低GCC版本:

1
sudo apt install gcc-5 g++-5

根据提示开始安装低版本

然后替换:

1
2
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50   # you will find that message that tells you the gcc-5 is set to be automatic.
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50 # similiar message as gcc

值得一提的是文章https://blog.csdn.net/sinat_40276791/article/details/80403784中提到说:**安装cuda的时候并没有降级gcc,g++;说明cuda9.0已经支持gcc7.0安装,所谓降级是后面要编译cuda测试例子的时候用到6.0以下的g++,和gcc版本。**但是我的gcc版本较高,为保险,我还是先降低。

2.开始安装CUDA

1
2
root@zhou-pc:~# cd /home/zongpu/下载 
root@zhou-pc:/home/zongpu/下载# sudo sh cuda_9.0.176_384.81_linux.run

ctrl+c 可快速结束阅读文档

接下来:

Do you accept the previously read EULA?
accept/decline/quit: accept

看到这个我是很荒的,但是博客https://www.cnblogs.com/Ph-one/p/9000211.html也是在ubuntu18.04上安装了cuda9,就当试试

You are attempting to install on an unsupported configuration. Do you wish to continue?
(y)es/(n)o [ default is no ]:

其他选项我都选了y(es),静静等待安装。

设置环境变量:

1
sudo gedit ~/.bashrc

#在打开的文本中添加

1
2
export PATH=/usr/local/cuda-9.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

更新:

1
source ~/.bashrc

验证CUDA9.0是否安装成功:

1
2
3
cd /usr/local/cuda-9.0/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery

输出为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1050"
CUDA Driver Version / Runtime Version 9.1 / 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 2003 MBytes (2099904512 bytes)
( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1493 MHz (1.49 GHz)
Memory Clock rate: 3504 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

测试说明安装成功!

安装cuDNN

之前已经下载好了文件现在直接安装:

1
2
3
4
5
6
tar -zxvf cudnn-9.0-linux-x64-v7.3.1.20.tgz

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

执行以上命令后,测试是否安装好:

1
nvcc -V

输出:

1
2
3
4
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

cuDNN安装成功!

安装Tensorflow

1检查并准备所需python环境

根据TF官网安装引导查看:https://tensorflow.google.cn/install/pip

1
2
3
python3 --version
pip3 --version
virtualenv --version

发现:

1
2
Command 'virtualenv' not found, but can be installed with:
apt install virtualenv

我的virtualenv 没有安装,根据提示安装virtualenv :

1
sudo apt install virtualenv

安装成功后,安装TF,在这里我用官网引导的安装方式:创建Python虚拟环境用于将程序包安装与系统隔离,

不同于在之前我自己直接pip install tensorflow-gpu

步骤如下:

通过选择Python解释器并创建一个./venv目录来保存它,从而创建一个新的虚拟环境 :

1
virtualenv --system-site-packages -p python3 ./venv

如下:

1
2
3
4
5
6
root@zhou-pc:~# virtualenv --system-site-packages -p python3 ./venv
Running virtualenv with interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /root/venv/bin/python3
Not overwriting existing python script /root/venv/bin/python (you must use /root/venv/bin/python3)
Installing setuptools, pip, wheel...done.

使用特定于shell的命令激活虚拟环境(以后每次使用TF前激活):

1
source ./venv/bin/activate  # sh, bash, ksh, or zsh

当virtualenv处于活动状态时,shell提示符前缀为(venv),如下:

1
2
root@zhou-pc:~# source ./venv/bin/activate 
(venv) root@zhou-pc:~#

在虚拟环境中安装软件包,而不会影响主机系统设置。从升级开始pip

1
2
3
pip install --upgrade pip

pip list # show packages installed within the virtual environment

并在以后退出virtualenv(我们要在这个环境中安装TF,现在不能退出):

1
deactivate    #在使用TensorFlow完成之前不要退出

值得一提的是:当环境创建好了,我们必须在每次使用tensorflow时激活它

2.安装TensorFlow(在虚拟环境venv中)

1
2
3
pip install tensorflow-gpu
pip install --upgrade tensorflow
python -c "import tensorflow as tf; print(tf.__version__)"

最后输出:

1
2
(venv) root@zhou-pc:~# python -c "import tensorflow as tf; print(tf.__version__)"
1.11.0

greaat!现在我们安装TF成功!

一个问题:在虚拟环境中安装的始终是cpu版本,不知道为什么;不在虚拟环境中可顺利安装GPU版本。

3.做一些测试(在venv环境以及非虚拟环境)

执行:

1
2
3
4
5
python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

非虚拟环境时可看到(展示部分):

1
GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1

虚拟环境中:

1
I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

安装keras

参考:https://keras-cn.readthedocs.io/en/latest/for_beginners/keras_linux/

类似指令比较多,我没有关注他们的区别:

1
2
3
sudo pip install -U --pre keras
#或者...
pip install keras -U --pre

安装anaconda

先下载anaconda的安装文件,我的版本是:Anaconda3-5.3.0-Linux-x86_64.sh

cd进文件所在目录执行:

1
bash Anaconda3-5.3.0-Linux-x86_64.sh

阅读文档(ctrl+c快速读完),按提示选择各个选项,进行安装:

其中涉及安装路径、添加环境变量、安装vs code的几个选项,根据提示选择。

但是安装时我选了环境变量后来还是不能启动anaconda,于是手动添加了一遍:

1
2
3
4
# 将anaconda的bin目录加入PATH,根据版本不同,也可能是~/anaconda3/bin
echo 'export PATH="~/anaconda3/bin:$PATH"'>> ~/.bashrc
# 更新bashrc以立即生效
source ~/.bashrc

执行:

1
conda list

看到各个组件信息则为成功

启动anaconda(暂停终端则断开):

1
anaconda-navigator

打开 Jupyter Notebook:

1
ipython notebook

在最开始安装anaconda时,我用root权限进行安装,结果Jupyter不能运行

1
2
3
4
5
6
root@zhou-pc:/home/zongpu/下载# ipython notebook
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future
[I 18:30:46.711 NotebookApp] JupyterLab extension loaded from /root/anaconda3/lib/python3.7/site-packages/jupyterlab
[I 18:30:46.711 NotebookApp] JupyterLab application directory is /root/anaconda3/share/jupyter/lab
[C 18:30:46.713 NotebookApp] Running as root is not recommended. Use --allow-root to bypass.

这应该是我对anaconda的作用不太了解,另外都使用了默认安装路径,不知道在以后要用到的时候会不会出问题。

卸载anaconda:

1
rm -rf ~/anaconda3

补充

一篇基于anacond安装TF的文章:

Ubuntu环境下基于Anaconda安装Tensorflowhttps://blog.csdn.net/hgdwdtt/article/details/78633232