简明Kubernetes教程

前言

该文档假定大家有一定的Docker基础,介绍了kubernetes的常用和重要的功能的简单使用方法,目的是为了让大家看完可以快速上手kubernetes,并且在遇到问题时或者想深入了解某方面的时候知道如何Google;该文档主要参考了https://kubernetes.io/docs/home/Kubernetes In Action

Kubernetes简介

kubernetes是什么

kubernetes源自google内部的BorgOmega的开源软件系统,允许在它上面部署和管理容器的应用。它将底层基础设施抽象,使得在大规模的服务器(VM或BM)节点上运行软件就像在单个节点上运行一样,简化了开发,部署和运维。

kubernetes提供了什么

  • 服务发现与负载均衡
  • 自动部署,扩容,回滚和自愈
  • 存储编排(本地,云),配置和加密配置管理
  • 自动装箱

kubernetes不提供什么

  • 不能源码部署和应用构建

  • 不提供应用层服务,比如Kafka,Spark,MySQL,Redis等

  • 不提供应用的日志,监控和告警解决方案

  • 不提供机器配置,维护,管理或自我修复系统

kubernetes对象

Kubernetes Objects是Kubernetes系统中的持久化的实体(etcd中)。 Kubernetes使用这些实体来表示集群的状态:

  • 在哪个节点上运行什么样的容器化应用
  • 这些 应用能获取的资源
  • 这些应用重启,升级和容错的策略

通过创建对象告知kubernetes系统想要达到的状态,然后系统会保证向这个状态收敛,比如副本数量。

基础对象

  • Pod,基础的执行,调度和部署单元
  • Service, 暴露一组Pods到网络的抽象
  • Volume,持久化存储和容器间共享存储的抽象

高级对象(控制器)

  • ReplicaSet,一组Pods的副本
  • Deployment,为Pods和ReplicaSet提供声明式更新
  • Statefulset,管理有状态应用的workload API
  • DaemonSet,保证每个node上都有一个pod的replica

架构与流程

如下图所示,kubernetes集群包含两部分,控制面(或者叫Master)和工作节点(Workers):

控制面

  • API服务器
  • 控制器
  • 调度器
  • etcd

工作节点

  • kubelet
  • kube-proxy
  • 容器运行时

用户将镜像推送到镜像仓库之后,通过控制面发送API服务器部署请求(或者命令行),控制器收到API服务器通知后创建服务部署的对象(Deployment,Pods的元数据存储在etcd中,后面会详细描述),最后API服务器向调度器发送通知,调度器选择合适的工作节点后,API服务器通知选中的节点通过kubelet创建Pod和其中的Containers。

k8s-cp

k8s-mgr

操作指南

Pod

Pod是kubernetes的基本构建模块,它是一组并置的容器,但比较常见的是一个pod只包含一个容器,kubernetes集群中的所有pod都在同一个共享网络空间中(无NAT平坦网络),使用kubernetes的开发者面对的最底层的不再是VM或BM而且pod这个逻辑主机。

kubernetes都是通过YAML文件描述pod等对象,一个典型的pod的yaml文件如下所示,包含4个部分:

  • apiVersion
  • kind,这个yaml文件要描述的对象的类型
  • metadata,pod的名字,标签等
  • spec,在pod的yaml文件中spec是描述容器的部分
1
2
3
4
5
6
7
8
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
spec:
containers:
- image: nginx
name: container-demo
1
2
3
4
5
6
7
# 直接通过yaml文件创建pod
[root@VM_2_15_centos ~/macduan/demo/pod]# kubectl create -f pod-nginx.yaml
pod/pod-demo created
# 查看pod会发现一个名字为pod-demo的pod正在运行
[root@VM_2_15_centos ~]# kubectl get pod pod-demo -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
pod-demo 1/1 Running 0 19h 192.168.0.34 10.1.2.6 <none>

会发现在10.1.2.15上创建的pod,最终被调度到10.1.2.6这个Node上,在10.1.2.2上起一个临时的pod,直接向pod-demo中的nginx服务发请求

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 在另一台Node上启一个临时的pod, 发送curl命令测试
[root@VM_2_2_centos ~]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://192.168.0.34
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
......

# 在10.1.2.6上观察可以看到临时curlutils pod被调度到10.1.2.15上, 经过了Pending, ContainerCreating和Terminating几个状态
[root@VM_2_6_centos ~]# kubectl get pod -o wide --watch
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
pod-demo 1/1 Running 0 19h 192.168.0.34 10.1.2.6 <none>
curlutils 0/1 Pending 0 0s <none> <none> <none>
curlutils 0/1 Pending 0 0s <none> 10.1.2.15 <none>
curlutils 0/1 ContainerCreating 0 0s <none> 10.1.2.15 <none>
curlutils 0/1 ContainerCreating 0 1s <none> 10.1.2.15 <none>
curlutils 0/1 Completed 0 2s 192.168.2.53 10.1.2.15 <none>
curlutils 0/1 Terminating 0 2s 192.168.2.53 10.1.2.15 <none>
curlutils 0/1 Terminating 0 3s 192.168.2.53 10.1.2.15 <none>

通过kubectl get pod pod-demo -o yaml可以看到具体的pod的详细信息,同时也会发现多一个status部分描述了这个pod的状态,这就是前面kubernetes对象提到的对象期望的状态和kubernetes向这个期望收敛的情况。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[root@VM_2_15_centos ~/macduan/pod]# kubectl get pod pod-demo -o yaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: container-demo
...
terminationGracePeriodSeconds: 30
volumes:
- name: default-token-n6snk
secret:
defaultMode: 420
secretName: default-token-n6snk
status:
conditions:
...
containerStatuses:
- containerID:
...
image: nginx:latest
...
name: container-demo
hostIP: 10.1.2.6
phase: Running
podIP: 192.168.0.34
qosClass: BestEffort
...

需要注意的是Pod中的容器(应用实例)crash了只会原地重启,Pod本身不会被调度,只有由于Node的原因或者Pod被删除了Pod才会被重新调度,在某个节点重新被创建。Pod的详细介绍请参考这里

Container restartPolicy

Pod可以通过PodSpec(yaml文件中的Pod.spec.restartPolicy)设置Pod中的所有的containers的重启策略:

  • Always,默认设置,不管Container是成功退出(exit 0),还是错误退出
  • OnFailure,只有非成功退出时才会重启
  • Never,从不重启

我们开发的绝大多数是一直运行的服务,所以推荐使用Always策略。

1
2
3
4
5
6
7
8
9
10
11
12
[root@VM_2_15_centos ~/macduan/demo/pod]# cat pod-lifecycle.pod
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
labels:
app: demo
spec:
containers:
- image: nginx
name: lc-demo
command: ['sh', '-c', 'sleep 5']
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@VM_2_15_centos ~/macduan/demo/pod]# kubectl get pod pod-demo -o yaml
apiVersion: v1
kind: Pod
metadata:
...
labels:
app: demo
name: pod-demo
...
spec:
containers:
...
image: nginx
imagePullPolicy: Always
name: lc-demo
...
...
restartPolicy: Always
...

用上述yaml文件创建一个pod,该Pod中的Container运行5s会正常退出,kubectl get pod pod-demo -o yaml可以看出默认的restartPolicy是Always,kubectl get pod -o wide -l app=demo --watch可以看到Pod的状态的变化,第一次Container退出后处于Completed状态后马上重启处于running状态,之后每次Complete之后会先进入CrashLoopBackOff。

CrashLoopBackOff策略是第一次马上重启,第二在10s之后,第三次开始每次等待时间是20s,40s指数增长直到160s,最后稳定在等待300s重启一次。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[root@VM_2_6_centos ~]# kubectl get pod -o wide -l app=demo --watch
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
pod-demo 0/1 Pending 0 0s <none> <none> <none>
pod-demo 0/1 Pending 0 0s <none> 10.1.2.15 <none>
pod-demo 0/1 ContainerCreating 0 0s <none> 10.1.2.15 <none>
pod-demo 0/1 ContainerCreating 0 1s <none> 10.1.2.15 <none>
pod-demo 1/1 Running 0 5s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 Completed 0 10s 192.168.2.118 10.1.2.15 <none>
pod-demo 1/1 Running 1 11s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 Completed 1 17s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 CrashLoopBackOff 1 30s 192.168.2.118 10.1.2.15 <none>
pod-demo 1/1 Running 2 32s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 Completed 2 37s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 CrashLoopBackOff 2 50s 192.168.2.118 10.1.2.15 <none>
pod-demo 1/1 Running 3 66s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 Completed 3 71s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 CrashLoopBackOff 3 82s 192.168.2.118 10.1.2.15 <none>
pod-demo 1/1 Running 4 2m6s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 Completed 4 2m11s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 CrashLoopBackOff 4 2m26s 192.168.2.118 10.1.2.15 <none>
pod-demo 1/1 Running 5 3m35s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 Completed 5 3m40s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 CrashLoopBackOff 5 3m54s 192.168.2.118 10.1.2.15 <none>
pod-demo 1/1 Running 6 6m30s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 Completed 6 6m35s 192.168.2.118 10.1.2.15 <none>
pod-demo 0/1 CrashLoopBackOff 6 6m47s 192.168.2.118 10.1.2.15 <none>
pod-demo 1/1 Running 7 11m 192.168.2.118 10.1.2.15 <none>

如果设置重启策略为OnFailure就会发现Container退出后没有没有重启。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@VM_2_15_centos ~/macduan/demo/pod]# cat pod-lifecycle.pod 
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
labels:
app: demo
spec:
containers:
- image: nginx
name: lc-demo
command: ['sh', '-c', 'sleep 5']
restartPolicy: OnFailure

[root@VM_2_6_centos ~]# kubectl get pod -o wide -l app=demo --watch
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
pod-demo 0/1 Pending 0 0s <none> <none> <none>
pod-demo 0/1 Pending 0 0s <none> 10.1.2.15 <none>
pod-demo 0/1 ContainerCreating 0 0s <none> 10.1.2.15 <none>
pod-demo 0/1 ContainerCreating 0 1s <none> 10.1.2.15 <none>
pod-demo 1/1 Running 0 2s 192.168.2.119 10.1.2.15 <none>
pod-demo 0/1 Completed 0 7s 192.168.2.119 10.1.2.15 <none>

如果修改yaml文件,让Container错误退出exit 1,会发现Pod的状态变化和前面Always`时很类似,只是Pod在Container错误退出后是进入Error状态。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[root@VM_2_15_centos ~/macduan/demo/pod]# cat pod-lifecycle.pod 
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
labels:
app: demo
spec:
containers:
- image: nginx
name: lc-demo
command: ['sh', '-c', 'sleep 5 && exit 1']
restartPolicy: OnFailure

[root@VM_2_6_centos ~]# kubectl get pod -o wide -l app=demo --watch
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
pod-demo 0/1 Pending 0 0s <none> <none> <none>
pod-demo 0/1 Pending 0 0s <none> 10.1.2.2 <none>
pod-demo 0/1 ContainerCreating 0 0s <none> 10.1.2.2 <none>
pod-demo 0/1 ContainerCreating 0 0s <none> 10.1.2.2 <none>
pod-demo 1/1 Running 0 2s 192.168.1.82 10.1.2.2 <none>
pod-demo 0/1 Error 0 7s 192.168.1.82 10.1.2.2 <none>
pod-demo 1/1 Running 1 8s 192.168.1.82 10.1.2.2 <none>
pod-demo 0/1 Error 1 13s 192.168.1.82 10.1.2.2 <none>
pod-demo 0/1 CrashLoopBackOff 1 28s 192.168.1.82 10.1.2.2 <none>
pod-demo 1/1 Running 2 30s 192.168.1.82 10.1.2.2 <none>
pod-demo 0/1 Error 2 35s 192.168.1.82 10.1.2.2 <none>

Container Probes

Probe是通过Kubelet对容器进行的一种诊断,我们还可以对Container设置以下3种探针:

  • livenessProbe,探测该容器是否存活,适用于服务未退出但是工作不正常的场景。
  • readinessProbe,探测该容器是否可以提供服务,适用于不希望初始化未成功(比如加载索引,配置)的容器提供服务的场景。
  • startupProbe,探测该容器是否启动成功,适用于对启动成功有限制条件的场景,如果设置了该探针,前两个探针会被disable直到该探针返回成功。

以上三个探针如果未设置默认返回都是成功,如果失败则受制于上一节描述的restartPolicy。Probe的设置方式有:ExecActionTCPSocketActionHTTPGetAction,详细的设置包括探测延迟,探测间隔等可以参考Configure Liveness, Readiness and Startup Probes,下面只给出使用HTTPGetAction方式的livenessProbe的例子:
在PodSpec.containers.livenessProbe中故意使用一个无效的port,会发现Pod的状态的变化和上一节描述Always重启策略时很像。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[root@VM_2_15_centos ~/macduan/demo/pod]# cat pod-lifecycle.pod 
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
labels:
app: demo
spec:
containers:
- image: nginx
name: lc-demo
livenessProbe:
httpGet:
path: /
port: 8090
initialDelaySeconds: 3
periodSeconds: 3
restartPolicy: Always

[root@VM_2_6_centos ~]# kubectl get pod -o wide -l app=demo --watch
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
pod-demo 0/1 Pending 0 0s <none> <none> <none>
pod-demo 0/1 Pending 0 0s <none> 10.1.2.2 <none>
pod-demo 0/1 ContainerCreating 0 0s <none> 10.1.2.2 <none>
pod-demo 0/1 ContainerCreating 0 1s <none> 10.1.2.2 <none>
pod-demo 1/1 Running 0 3s 192.168.1.83 10.1.2.2 <none>
pod-demo 1/1 Running 1 13s 192.168.1.83 10.1.2.2 <none>
pod-demo 1/1 Running 2 25s 192.168.1.83 10.1.2.2 <none>
pod-demo 1/1 Running 3 37s 192.168.1.83 10.1.2.2 <none>
pod-demo 0/1 CrashLoopBackOff 3 49s 192.168.1.83 10.1.2.2 <none>
pod-demo 1/1 Running 4 73s 192.168.1.83 10.1.2.2 <none>
pod-demo 0/1 CrashLoopBackOff 4 85s 192.168.1.83 10.1.2.2 <none>
pod-demo 1/1 Running 5 2m17s 192.168.1.83 10.1.2.2 <none>
pod-demo 0/1 CrashLoopBackOff 5 2m27s 192.168.1.83 10.1.2.2 <none>
pod-demo 1/1 Running 6 3m49s 192.168.1.83 10.1.2.2 <none>
pod-demo 0/1 CrashLoopBackOff 6 4m 192.168.1.83 10.1.2.2 <none>
pod-demo 1/1 Running 7 6m50s 192.168.1.83 10.1.2.2 <none>
pod-demo 0/1 CrashLoopBackOff 7 7m 192.168.1.83 10.1.2.2 <none>
pod-demo 1/1 Running 8 12m 192.168.1.83 10.1.2.2 <none>

ReplicaSet

ReplicaSet主要用于副本控制(比如,设置副本数为3,保证副本有且仅有3份)和Pod异常时重新调度,上一节提到的Pod被删除后会被重新调度就是ReplicaSet做的事情。

在ReplicaSet的yaml文件一样也有spec部分描述想要达到的状态,特别是副本数量;它的Pod的描述在template部分。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: apps/v1beta2
kind: ReplicaSet
metadata:
name: rs-demo
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: container-demo
image: nginx

这里用kubectl apply而不是之前的kubectl create,是因为kubectl apply属于Declarative Managementkubectl create属于Imperative Management ,Declarative的方式可追踪的增量变更,两种方式不可混用不然会产生未定义行为。

1
2
3
4
5
6
# 创建ReplicaSet
[root@VM_2_15_centos ~/macduan/demo/rs]# kubectl apply -f rs-demo.yaml
replicaset.apps/rs-demo created
[root@VM_2_15_centos ~/macduan/demo/rs]# kubectl get rs rs-demo -o wide
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
rs-demo 2 0 0 10s container-demo nginx app=nginx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 发现创建了2个Pod, 名字都是yaml文件中设置的pod-demo开头, 被调度到另外2个节点上了
[root@VM_2_15_centos ~/macduan/demo/rs]# kubectl get pod -l app=nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
rs-demo-lk277 1/1 Running 0 23s 192.168.0.50 10.1.2.6 <none>
rs-demo-p5q5p 1/1 Running 0 23s 192.168.1.43 10.1.2.2 <none>

# 在2台不同的节点上可以看到Pod的container进程
[root@VM_2_2_centos ~]# docker container ls -f name=container
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cbc23e5b9f3c nginx "nginx -g 'daemon of…" About an hour ago Up About an hour k8s_container-demo_rs-demo-lk277_default_b1e17f82-d506-11e9-91dd-8a673d1f8ffc_0

[root@VM_2_6_centos ~]# docker container ls -f name=container
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
17e153df5c9f nginx "nginx -g 'daemon of…" About an hour ago Up About an hour k8s_container-demo_pod-demo_default_6c0b3042-d526-11e9-91dd-8a673d1f8ffc_0

ReplicaSet(rs-demo)设置了2个replica,所以上面可以看到有2个Pod,都是rs-demo后面加上随机字符串,如果删除其中一个pod,ReplicaSet会启动一个新的Pod,从下面代码可以看出,删除rs-demo-lk277后拉起了一个新的Pod(rs-demo-cj5p5),replica数量恢复到2个了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@VM_2_6_centos ~]# kubectl delete pod rs-demo-lk277
pod "rs-demo-lk277" deleted

[root@VM_2_15_centos ~]# kubectl get pod -o wide -l app=nginx --watch
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
rs-demo-lk277 1/1 Running 0 5h53m 192.168.1.49 10.1.2.2 <none>
rs-demo-p5q5p 1/1 Running 0 5h53m 192.168.2.69 10.1.2.15 <none>
rs-demo-lk277 1/1 Terminating 0 5h54m 192.168.1.49 10.1.2.2 <none>
rs-demo-cj5p5 0/1 Pending 0 0s <none> <none> <none>
rs-demo-cj5p5 0/1 Pending 0 0s <none> 10.1.2.2 <none>
rs-demo-cj5p5 0/1 ContainerCreating 0 0s <none> 10.1.2.2 <none>
rs-demo-lk277 0/1 Terminating 0 5h54m 192.168.1.49 10.1.2.2 <none>
rs-demo-cj5p5 0/1 ContainerCreating 0 1s <none> 10.1.2.2 <none>
rs-demo-lk277 0/1 Terminating 0 5h54m 192.168.1.49 10.1.2.2 <none>
rs-demo-lk277 0/1 Terminating 0 5h54m 192.168.1.49 10.1.2.2 <none>
rs-demo-cj5p5 1/1 Running 0 5s 192.168.1.50 10.1.2.2 <none>

修改rs-demo的yaml文件中的replica参数为3后,执行kubectl apply会拉起一个新的Pod,rs-demo的状态也在变化。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@VM_2_15_centos ~/macduan/demo/rs]# kubectl apply -f rs-demo.yaml 
replicaset.apps/rs-demo configured

[root@VM_2_6_centos ~]# kubectl get pod -o wide -l app=nginx --watch
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
rs-demo-cj5p5 1/1 Running 0 4m24s 192.168.1.50 10.1.2.2 <none>
rs-demo-p5q5p 1/1 Running 0 5h59m 192.168.2.69 10.1.2.15 <none>
rs-demo-wxvkn 0/1 Pending 0 0s <none> <none> <none>
rs-demo-wxvkn 0/1 Pending 0 0s <none> 10.1.2.6 <none>
rs-demo-wxvkn 0/1 ContainerCreating 0 0s <none> 10.1.2.6 <none>
rs-demo-wxvkn 0/1 ContainerCreating 0 1s <none> 10.1.2.6 <none>
rs-demo-wxvkn 1/1 Running 0 5s 192.168.0.55 10.1.2.6 <none>

[root@VM_2_6_centos ~]# kubectl get rs rs-demo -o wide --watch
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
rs-demo 2 2 2 5h59m container-demo nginx app=nginx
rs-demo 3 2 2 5h59m container-demo nginx app=nginx
rs-demo 3 2 2 5h59m container-demo nginx app=nginx
rs-demo 3 3 2 5h59m container-demo nginx app=nginx
rs-demo 3 3 3 5h59m container-demo nginx app=nginx

# 可以直接kubectl delete rs rs-demo删除, 但是推荐用kubectl delete -f <filename>的方式删除对象,删除ReplicaSet之后,所有的Pods也自动被删除
[root@VM_2_15_centos ~/macduan/demo/rs]# kubectl delete -f rs-demo.yaml
replicaset.apps "rs-demo" deleted

DaemonSet

有时候需要在每个Node上正好运行一个Pod(比如监控),这时候就需要DaemonSet了,下面的yaml文件可以创一个DaemonSet。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
name: ds-demo
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
name: pod-demo
labels:
app: nginx
spec:
containers:
- image: nginx:1.17.3
name: cont-demo

上面的yaml文件中没有指定,replica数量,但是创建成功之后发现在每个节点上都有spec.template部分描述的Pod被创建了。

1
2
3
4
5
6
7
8
9
10
[root@VM_2_15_centos ~/macduan/demo/daemonset]# kubectl apply -f demo-ds.yaml 
daemonset.apps/ds-demo created
[root@VM_2_15_centos ~/macduan/demo/daemonset]# kubectl get daemonset ds-demo -o wide
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
ds-demo 3 3 3 3 3 <none> 13s cont-demo nginx:1.17.3 app=nginx
[root@VM_2_15_centos ~/macduan/demo/daemonset]# kubectl get pod -l app=nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
ds-demo-gsr4c 1/1 Running 0 23s 192.168.2.105 10.1.2.15 <none>
ds-demo-lqptf 1/1 Running 0 23s 192.168.0.93 10.1.2.6 <none>
ds-demo-qmdlt 1/1 Running 0 23s 192.168.1.67 10.1.2.2 <none>

Deployment

Deployment是一种高阶资源,以声明的方式部署和升级应用,其底层是一个或多个ReplicaSet。使用下面的yaml文件创建一个Deployment时,会发现自动创建了一个ReplicaSet和2个Pod。而Pod命名是一个RS的名字加一个随机字符串,RS名字中的那个字符串65f7dfc8fb实际是yaml文件中Pod描述部分的哈希值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: dp-demo
spec:
replicas: 2
template:
metadata:
name: pod-demo
labels:
app: nginx
spec:
containers:
- image: nginx:1.16.1
name: cont-demo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[root@VM_2_15_centos ~/macduan/demo/deployment]# kubectl apply -f demo-deploy.yaml 
deployment.apps/dp-demo created
[root@VM_2_15_centos ~/macduan/demo/deployment]# kubectl get deployment -o wide
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
dp-demo 2 2 2 2 14s cont-demo nginx:1.16.1 app=nginx

[root@VM_2_15_centos ~/macduan/demo/deployment]# kubectl get rs -l app=nginx -o wide
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
dp-demo-65f7dfc8fb 2 2 2 44s cont-demo nginx:1.16.1 app=nginx,pod-template-hash=65f7dfc8fb

[root@VM_2_15_centos ~/macduan/demo/deployment]# kubectl get pod -l app=nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
dp-demo-65f7dfc8fb-6pn7p 1/1 Running 0 55s 192.168.2.72 10.1.2.15 <none>
dp-demo-65f7dfc8fb-jlnmh 1/1 Running 0 55s 192.168.1.55 10.1.2.2 <none>

[root@VM_2_15_centos ~/macduan/demo/deployment]# kubectl get pod -l app=nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
dp-demo-65f7dfc8fb-6pn7p 1/1 Running 0 3m37s 192.168.2.72 10.1.2.15 <none>
dp-demo-65f7dfc8fb-jlnmh 1/1 Running 0 3m37s 192.168.1.55 10.1.2.2 <none>
dp-demo-65f7dfc8fb-lkxq2 1/1 Running 0 3s 192.168.0.58 10.1.2.6 <none>

咋一看Deployment和RS区别不大,但是如果我们需要升级应用的时候(只有修改Pod的描述对Deployment来说才算升级),Deployment的作用就可以更好提现了。

我们将demo-deploy.yaml中的nginx版本从1.16.1改成1.17.1, 然后更新Deployment,更新完成之后发现新增了一个IMAGES为nginx:1.17.1的RS,老版本的RSnginx:1.16.1的Pod数量 变为0了。

1
2
3
4
5
6
7
8
9
[root@VM_2_15_centos ~/macduan/demo/deployment]# kubectl apply -f demo-deploy.yaml --record
deployment.apps/dp-demo configured
[root@VM_2_6_centos ~]# kubectl get deployment -l app=nginx -o wide
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
dp-demo 2 2 2 2 109s cont-demo nginx:1.17.1 app=nginx
[root@VM_2_2_centos ~]# kubectl get rs -l app=nginx -o wide
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
dp-demo-65f7dfc8fb 0 0 0 96s cont-demo nginx:1.16.1 app=nginx,pod-template-hash=65f7dfc8fb
dp-demo-85b846dd4f 2 2 2 19s cont-demo nginx:1.17.1 app=nginx,pod-template-hash=85b846dd4f

前面提到过Deployment是更高级别的抽象,会管理多个版本,不仅可以滚动更新(默认模式,可以选择Recreate模式)还可以回滚或在各个版本之间切换。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 可以看到创建了两个版本
[root@VM_2_15_centos ~/macduan/demo/deployment]# kubectl rollout history deployment dp-demo
deployment.extensions/dp-demo
REVISION CHANGE-CAUSE
1 kubectl apply --filename=demo-deploy.yaml --record=true
2 kubectl apply --filename=demo-deploy.yaml --record=true
# 直接回滚
[root@VM_2_15_centos ~/macduan/demo/deployment]# kubectl rollout undo deployment dp-demo
deployment.extensions/dp-demo
[root@VM_2_6_centos ~]# kubectl get deployment -l app=nginx -o wide
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
dp-demo 2 2 2 2 3m21s cont-demo nginx:1.16.1 app=nginx
# 切换到指定版本
[root@VM_2_15_centos ~/macduan/demo/deployment]# kubectl rollout undo deployment dp-demo --to-revision=2
deployment.extensions/dp-demo
[root@VM_2_6_centos ~]# kubectl get deployment -l app=nginx -o wide
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
dp-demo 2 2 2 2 4m8s cont-demo nginx:1.17.1 app=nginx

Service

服务暴露给集群内部客户端

用如下yaml文件可以创建一个ClusterIP类型的服务,仅可以在集群内部访问。

1
2
3
4
5
6
7
8
9
10
apiVersion: v1
kind: Service
metadata:
name: srv-demo
spec:
ports:
- port: 80
targetPort: 80
selector:
app: nginx

创建完成之后发现,有一个TYPE为ClusterIP的service资源,分配了一个192.168.255.102的ip,其SELECTOR是app=nginx,service控制器会持续扫描找到label与这个service的selector一致的pod关联(上一节创建的Pod)。通过分配的虚拟ip,请求被转发到后面的某个Pod上的nginx服务。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl apply -f srv-demo.yaml 
service/srv-demo created
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl get svc srv-demo -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
srv-demo ClusterIP 192.168.255.102 <none> 80/TCP 104s app=nginx
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://192.168.255.102
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
...
# 上文提到的关联实际是通过Endpoint
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl get endpoints srv-demo
NAME ENDPOINTS AGE
srv-demo 192.168.0.76:80,192.168.1.61:80,192.168.2.81:80 3h28m

上面提到的Service与Pods关联,本质就是Service知道它背后支持它的Pods的ip和port,这是通过Endpoints实现的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl describe svc srv-demo
Name: srv-demo
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"srv-demo","namespace":"default"},"spec":{"ports":[{"port":80,"tar...
Selector: app=nginx
Type: ClusterIP
IP: 192.168.255.102
Port: <unset> 80/TCP
TargetPort: 80/TCP
Endpoints: 192.168.0.76:80,192.168.1.61:80,192.168.2.81:80
Session Affinity: None
Events: <none>
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl get endpoints srv-demo
NAME ENDPOINTS AGE
srv-demo 192.168.0.76:80,192.168.1.61:80,192.168.2.81:80 3h28m
服务发现

看完上面的过程会发现一个问题,集群内部的客户端要访问这个服务必须知道192.168.255.102, 那如何获取呢?有两种方式:

  • 环境变量,如果客户端Pod创建晚于这个Service,那么这个Pod就会有这个Service的IP和PORT的环境变量

    1
    2
    3
    4
    5
    6
    7
    8
    [root@VM_2_15_centos ~/macduan/demo/service]# kubectl exec dp-demo-56d95d4c95-xzzc7 env | grep SRV_DEMO
    SRV_DEMO_PORT_80_TCP=tcp://192.168.255.102:80
    SRV_DEMO_PORT_80_TCP_ADDR=192.168.255.102
    SRV_DEMO_SERVICE_HOST=192.168.255.102
    SRV_DEMO_SERVICE_PORT=80
    SRV_DEMO_PORT=tcp://192.168.255.102:80
    SRV_DEMO_PORT_80_TCP_PROTO=tcp
    SRV_DEMO_PORT_80_TCP_PORT=80
  • DNS,环境变量获取方式有其局限性,Kubernetes其内部DNS服务使客户端Pod可以通过FQDN来访问(前提是知道服务的名称),FQDN格式为service_name.[$namespace].[svc].[local],但是Service的端口号如果不是标准的,客户端Pod必须提前知道

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    [root@VM_2_15_centos ~/macduan/demo/service]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://srv-demo.default.svc.cluster.local
    <!DOCTYPE html>
    <html>
    <head>
    <title>Welcome to nginx!</title>
    <style>
    ...
    # curl http://srv-demo.default.svc.cluster.local
    # curl http://srv-demo.default
    # curl http://srv-demo
    # 上面这三种方式都可以

服务暴露给集群外部客户端

有三种方式可以将服务暴露给集群外部客户端:

  • NodePort,在所有Node上打开一个端口,客户端向其中任何一个Node的该端口发送的请求都会被Service重定向到其背后某个Pod的应用的端口上。
  • LoadBalance,是NodePort的扩展,会自动分配Node上的一个端口,创建CLB(腾讯云TKE会默认创建一个CLB),本质上是一个NodePort方式,只是端口是随机分配,CLB自动创建。
  • Ingress,使用一个公网IP公开多个服务,详细请见这里

下面这个manifest是创建一个NodePort服务

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@VM_2_15_centos ~/macduan/demo/service]# cat svc-np-demo.yaml 
apiVersion: v1
kind: Service
metadata:
name: svc-np-demo
spec:
type: NodePort
ports:
- port: 80
targetPort: 80
nodePort: 31135
selector:
app: nginx

创建完成之后发现,分配了一个Cluster-IP,通过这个ip或者FQDN访问该服务都ClusterIP类型的服务类似,这里多了一个NodePort 31135,虽然Pods只被调度到10.1.2.2和10.1.2.15两个Node上,但是在10.1.2.6上发现31135也是被打开了,在上面发送请求也会被Service重定向到10.1.2.2和10.1.2.15上的某个Pod上。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl apply -f svc-np-demo.yaml 
service/svc-np-demo created
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl get svc svc-np-demo -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
svc-np-demo NodePort 192.168.255.96 <none> 80:31135/TCP 16s app=nginx
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl get pod -l app=nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
dp-demo-56d95d4c95-tss5j 1/1 Running 0 4h18m 192.168.1.61 10.1.2.2 <none>
dp-demo-56d95d4c95-xzzc7 1/1 Running 0 4h18m 192.168.2.81 10.1.2.15 <none>
[root@VM_2_6_centos ~]# curl localhost:31135
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...

下面这个manifest创建一个LoadBalance类型服务

1
2
3
4
5
6
7
8
9
10
11
12
[root@VM_2_15_centos ~/macduan/demo/service]# cat svc-lb-demo.yaml 
apiVersion: v1
kind: Service
metadata:
name: svc-lb-demo
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 80
selector:
app: nginx

创建完成之后发现,自动创建了CLB并且分配了一个外网IP192.144.195.158,外部客户端也可以访问该服务了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl apply -f svc-lb-demo.yaml 
service/svc-lb-demo created
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl get svc svc-lb-demo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc-lb-demo LoadBalancer 192.168.255.250 <pending> 80:31134/TCP 8s
[root@VM_2_15_centos ~/macduan/demo/service]# kubectl get svc svc-lb-demo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc-lb-demo LoadBalancer 192.168.255.250 192.144.195.158 80:31134/TCP 2m45s
[root@VM_9_103_centos~]
$ curl http://192.144.195.158
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...

ConfigMap和Secret

一般应用都会需要配置项,kubernetes提供了ConfigMap和Secret两种资源对象。ConfigMap可以通过字符串,文件或文件的方式创建,使用文件的方式创建的ConfigMap的key就是文件名,value就是文件内容,那么通过文件夹创建的就是批量文件创建方式。

创建一个文件,并通过文件方式创建一个ConfigMap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@VM_2_15_centos ~/macduan/demo/configmap]# cat index.html 
<h>"Hello World!"</h>
[root@VM_2_15_centos ~/macduan/demo/configmap]# kubectl create configmap cfm-demo --from-file=index.html
[root@VM_2_15_centos ~/macduan/demo/configmap]# kubectl get configmap cfm-demo -o yaml
apiVersion: v1
data:
index.html: |
<h>"Hello World!"</h>
kind: ConfigMap
metadata:
...
name: cfm-demo
namespace: default
...

我们可以修改Deployment的Pod模板部分,通过volumes和volumeMount使用创建的ConfigMap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: dp-cfm-demo
spec:
replicas: 2
template:
metadata:
name: pod-demo
labels:
app: nginx
spec:
containers:
- image: nginx:1.17.3
name: cont-demo
volumeMounts:
- name: config
mountPath: /usr/share/nginx/html
readOnly: true
volumes:
- name: config
configMap:
name: cfm-demo

上述yaml文件中,将ConfigMap cfm-demo挂载(覆盖)了容器/usr/share/nginx/html目录,使用上面的yaml文件创建一个Deployment之后,发现向其中的Pod发送请求收到的内容变成生存cfm-demo的文件的内容了。

1
2
3
4
5
[root@VM_2_15_centos ~/macduan/demo/deployment]# kubectl get pod -l app=nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
dp-cfm-demo-8c4975c9b-54bvl 1/1 Running 0 14m 192.168.1.63 10.1.2.2 <none>
dp-cfm-demo-8c4975c9b-jwrg7 1/1 Running 0 14m 192.168.2.90 10.1.2.15 <none>
[root@VM_2_6_centos ~]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://192.168.1.63<h>"Hello World!"</h>

Secret本质上和ConfigMap是一样的,只是它的内容是加密的,用相同的文件创建一个secret,发现其内容用Base64编码了

1
2
3
4
5
6
7
8
9
[root@VM_2_15_centos ~/macduan/demo/configmap]# kubectl create secret generic secret-demo --from-file=index.html 
secret/secret-demo created
[root@VM_2_15_centos ~/macduan/demo/configmap]# kubectl get secret secret-demo -o yaml
apiVersion: v1
data:
index.html: PGg+IkhlbGxvIFdvcmxkISI8L2g+Cg==
kind: Secret
metadata:
...

PV和PVC

虽然kubernetes有Volumes,但是Volumes要求Pod的开发者了解集群中可用的真实的网络存储组件,这与kubernetes的向开发人员隐藏真实的基础设施的理念不符,所以引入了PV与PVC

  • PV,由管理创建(或动态创建)的一块存储区域,它规定了存储空间大小,访问模式和存储介质类型。
  • PVC,声明了需要使用的PV的最低容量要求和访问模式,在Pod中被使用。
    创建一个PV,之后创建一个PVC会自动绑定满足条件的PV

PV和PVC的yaml文件如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# cat pv-demo.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-demo
spec:
capacity:
storage: 5Mi
accessModes:
- ReadWriteOnce
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
hostPath:
path: /root/macduan/demo/pv-pvc/tmp

[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# cat pvc-demo.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-demo
spec:
resources:
requests:
storage: 5Mi
accessModes:
- ReadWriteOnce
storageClassName: ""

创建PV之后发现其状态是Available,在创建PVC之后发现该PV满足其条件并绑定了,所以pv-demo的状态变成了Bound。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# kubectl apply -f pv-demo.yaml 
persistentvolume/pv-demo created
[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# kubectl get pv pv-demo
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv-demo 5Mi RWO,ROX Retain Available 6s
[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# kubectl apply -f pvc-demo.yaml
persistentvolumeclaim/pvc-demo created

[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# kubectl get pvc pvc-demo -o wide
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-demo Bound pv-demo 5Mi RWO,ROX 10s
[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# kubectl get pv pv-demo -o wide
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv-demo 5Mi RWO,ROX Retain Bound default/pvc-demo 50s
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# cat pod-demo.yaml 
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
spec:
containers:
- image: nginx
name: container-demo
volumeMounts:
- name: index
mountPath: /usr/share/nginx/html
volumes:
- name: index
persistentVolumeClaim:
claimName: pvc-demo

上面的PV使用hostPath模式,集群模式的kubernetes的PV不应该使用hostPath,因为不知道Pod会被调度到哪个节点上;这里为了demo方便所以使用hostPath模式,不过简单的workaround可以不影响demo;首先用上面的yaml文件创建一个Pod,发现它被调度到节点10.2.2.2上了,那么在该节点上创建目录PV的yaml文件中指定的hostPath/root/macduan/demo/pv-pvc/tmp,并创建一个index.html的文件,和上一节ConfigMap类似,PVC帮定的PV(pv-demo)挂载到nginx的/usr/share/nginx/html,其中的index.html被/root/macduan/demo/pv-pvc/tmp/index.html覆盖了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# kubectl apply -f pod-demo.yaml 
pod/pod-demo created

[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# kubectl get pod pod-demo
NAME READY STATUS RESTARTS AGE
pod-demo 1/1 Running 0 6s

[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# kubectl get pod pod-demo -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
pod-demo 1/1 Running 0 10s 192.168.1.64 10.1.2.2 <none>

[root@VM_2_2_centos ~]# mkdir -p /root/macduan/demo/pv-pvc/tmp
[root@VM_2_2_centos ~/macduan/demo/pv-pvc/tmp]# echo '<h>Hello World!</h>' > index.html
[root@VM_2_15_centos ~/macduan/demo/pv-pvc]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://192.168.1.64
<h>Hello World!</h>

StatefulSet

前面讲到的Deployment还是Service都只适用无状态应用,对于有状态的应用,客户端需要每次访问到特定的Pod,每个Pod需要使用固定的存储数据,即使在Pod可能被删除后被拉起调度到别的节点。这里参考了kubernetes官网的StatefulSet Basic

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx"
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Mi

上面的yaml文件包含了Service和StatefuSet的manifest,需要注意的是,Service部分的spec.clusterIP为None,这是因为StatefulSet的控制Service必须是headless模式,在StatefulSet部分的sepc.volumeClaimTemplates是PVC的模板,创建该StatefulSet的Pod时,会自动为每个Pod分配一个PVC。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 创建一个StatefulSet服务
[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl apply -f ss-demo.yaml
service/nginx created
statefulset.apps/web created

[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl get svc nginx -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
nginx ClusterIP None <none> 80/TCP 6m44s app=nginx
[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl get statefulset -o wide
NAME DESIRED CURRENT AGE CONTAINERS IMAGES
web 2 2 6m57s nginx nginx
[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl get pod -l app=nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
web-0 1/1 Running 0 8m2s 192.168.2.93 10.1.2.15 <none>
web-1 1/1 Running 0 7m28s 192.168.1.65 10.1.2.2 <none>
# Each Pod has a stable hostname based on its ordinal index. Use kubectl exec to execute the hostname command in each Pod.
[root@VM_2_15_centos ~/macduan/demo/statefulset]# for i in 0 1; do kubectl exec web-$i -- sh -c 'hostname'; done
web-0
web-1

上面会发现Pod的名字不再是随机的了,而是有固定编号了,而且每个Pod有固定的hostname。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# 通过SRV记录发现该服务的两个Pod的FQDN
[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl run -it srvlookup --image=tutum/dnsutils --rm --restart=Never -- dig SRV nginx.default.svc.cluster.local
...
;; ADDITIONAL SECTION:
web-1.nginx.default.svc.cluster.local. 5 IN A 192.168.1.65
web-0.nginx.default.svc.cluster.local. 5 IN A 192.168.0.90
...

# 修改每个Pod的index.html(挂载到PV上了)
[root@VM_2_15_centos ~/macduan/demo/statefulset]# for i in 0 1; do kubectl exec web-$i -- sh -c 'echo $(hostname) > /usr/share/nginx/html/index.html'; done

[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://web-0.nginx
web-0
pod "curlutils" deleted
[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://web-1.nginx
web-1

[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl delete pod web-0
pod "web-0" deleted
[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl get pod -l app=nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
web-0 1/1 Running 0 54s 192.168.0.90 10.1.2.6 <none>
web-1 1/1 Running 0 3h42m 192.168.1.65 10.1.2.2 <none>
[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://web-0.nginx.default
web-0

从上面的操作可以看出,修改了每个Pod的/usr/share/nginx/html/index.html(通过PV挂载了)的内容后,给不同的Pod发送请求会收到不同的结果,即使删除了web-0,它被拉起后调度到10.1.2.6了,对它的请求返回还是之前写入的内容。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 修改副本数量为3
[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl apply -f ss-demo.yaml
service/nginx unchanged
statefulset.apps/web configured

[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl get pod -l app=nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
web-0 1/1 Running 0 49m 192.168.0.90 10.1.2.6 <none>
web-1 1/1 Running 0 4h30m 192.168.1.65 10.1.2.2 <none>
web-2 1/1 Running 0 52s 192.168.2.104 10.1.2.15 <none>

[root@VM_2_15_centos ~/macduan/demo/statefulset]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://web-2.nginx.default
<html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx/1.17.3</center>
</body>
</html>

修改副本数量为3之后,一个新的Pod web-2被创建了,编号是从0开始递增的,由于这里的PV是Dynamic Provision,内容是空的,所以向web-2发送的请求会出现上面的回复。

服务改造指南

检索服务

绝大部分的检索系统是包含基础召回服务和汇聚服务(一层或者两层),这里只讲如何改造基础召回服务,后面提到检索服务实际是指基础召回服务。简单讲检索服务的节点就是索引所在的节点,后面就叫qn(query node),它们接收请求根据索引数据返回检索结果,然后在汇聚层做聚合。由于索引数据往往很大,所以要做shard,为了保证可靠性也需要replica,抽象来看基础召回服务的拓扑结构就是一个N * M的矩阵,M列就是指shard数量,N行就是N份replica,每次检索请求会到所有的shard的某个replica上。检索服务,在Kubernetes上每个qn装在一个pod中,可以做成无状态的方式和有状态的方式:

无状态服务

每一列一个Deployment+Service,检索服务的每一列都是相同的索引,所以在每一列上是无状态的,这种方式比较简单,基本上就是参考操作指南章节的Deployment和Service部分就可以了。

有状态服务

每一行一个Statefulset+Service,检索服务每一行的每个qn之间的索引是不一样的,也就是有状态的索引需要用Statefulset。

这里假设检索服务拓扑结构是一个2*3的矩阵,所以需要两个Statefulset+Service,复用操作指南章节中的yaml,用nginx模拟qn假设其index.html是所以文件,代理层(聚合层)这里用简单的命令模拟。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
apiVersion: v1
kind: Service
metadata:
name: retrieve-01
labels:
app: retrieve-01
spec:
ports:
- port: 80
name: qn-01
clusterIP: None
selector:
app: qn-01
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ss-01
spec:
serviceName: "retrieve-01"
replicas: 3
selector:
matchLabels:
app: qn-01
template:
metadata:
labels:
app: qn-01
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Mi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
apiVersion: v1
kind: Service
metadata:
name: retrieve-02
labels:
app: retrieve-02
spec:
ports:
- port: 80
name: qn-02
clusterIP: None
selector:
app: qn-02
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ss-02
spec:
serviceName: "retrieve-02"
replicas: 3
selector:
matchLabels:
app: qn-02
template:
metadata:
labels:
app: qn-02
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Mi

用上面的yaml文件创建两个有状态的服务,分布代表检索服务的两行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 第一行
[root@VM_2_6_centos ~]# kubectl run -it srvlookup --image=tutum/dnsutils --rm --restart=Never -- dig SRV retrieve-01.default.svc.cluster.local
...
;; ADDITIONAL SECTION:
ss-01-0.retrieve-01.default.svc.cluster.local. 5 IN A 192.168.0.115
ss-01-2.retrieve-01.default.svc.cluster.local. 5 IN A 192.168.1.101
ss-01-1.retrieve-01.default.svc.cluster.local. 5 IN A 192.168.2.140
...

# 第二行
[root@VM_2_6_centos ~]# kubectl run -it srvlookup --image=tutum/dnsutils --rm --restart=Never -- dig SRV retrieve-02.default.svc.cluster.local
...
;; ADDITIONAL SECTION:
ss-02-1.retrieve-02.default.svc.cluster.local. 5 IN A 192.168.1.104
ss-02-2.retrieve-02.default.svc.cluster.local. 5 IN A 192.168.2.150
ss-02-0.retrieve-02.default.svc.cluster.local. 5 IN A 192.168.0.120
...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
上面创建的检索服务的拓扑结构如下
ss-01-0 ss-01-1 ss-01-2 ---> retrieve-01
ss-02-0 ss-02-1 ss-02-2 ---> retrieve-02

用index.html模拟索引文件
[root@VM_2_6_centos ~]# for i in 0 1 2;do kubectl exec ss-01-$i -- sh -c "echo index-$i > /usr/share/nginx/html/index.html";done
[root@VM_2_6_centos ~]# for i in 0 1 2;do kubectl exec ss-02-$i -- sh -c "echo index-$i > /usr/share/nginx/html/index.html";done

cur请求模拟检索请求,在某一行的服务中,可以指定到具体的qn进行检索(一般是所有的)
[root@VM_2_6_centos ~]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://ss-01-0.retrieve-01
index-0
[root@VM_2_6_centos ~]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://ss-01-2.retrieve-01
index-2
[root@VM_2_6_centos ~]# kubectl run -it curlutils --image=tutum/curl --generator=run-pod/v1 --rm --restart=Never -- curl http://ss-02-0.retrieve-02
index-0