DevOps文化 & SRE实战分享平台

0%

Kubernetes v1.15.3 证书过期更新


文章声明:此文基于木子实操撰写
生产环境:Kubernetes v1.15.3
论证耗时:3h
撰文耗时:1h
校文耗时:30m
问题关键字:x509: certificate has expired or is not yet valid


事情起因

今天有同事反馈Jenkins构建失败,于是木子上Jenkins查看了一下对应日志,发现以下报警:

1
Get https://192.168.1.20:8443/api?timeout=32s: x509: certificate has expired or is not yet valid

从这里不难看出是Jenkins调用K8S API证书过期。于是上服务器查看了一下对应的证书有效期,发现是到2020年4月13日,刚好就是今天。

1
2
3
[root@k8smaster01 ~]# openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep ' Not '
Not Before: Apr 13 12:32:11 2019 GMT
Not After : Apr 13 12:32:11 2020 GMT

解决问题

即然是正常过期就需要更新证书,在kubernetes 1.15版本提供了强大的证书管理功能,kubeadm默认生成的ca证书有效期是10年,其他证书(如etcd证书、apiserver证书等)有效期均为1年。因为在构建K8S集群的时候,木子的ectd集群采用的是外部集群模式,而非与K8S集成,所以在重建证书的时候,不通够一尘不变。另外木子在配置etcd证书的时候,配置的是87600h也就是10年,所以etcd证书是不需要换的,只需要换K8S的证书即可。如果说你是使用的K8S容器ETCD集群模式,你可以直接使用kubeadm alpha certs renew all命令更换所有证书,但如果你和木子一样使用的是etcd外部集群,这时候使用kubeadm alpha certs renew all就不行了,因为在进行ectd证书续期的时候会报错,这样就会造成其它证书无法正常续期的情况,详细如下所示:

1
2
3
4
5
6
7
[root@k8smaster01 kubernetes]# kubeadm alpha certs renew all --config /root/config.yaml
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
apiserver-etcd-client is not a valid certificate for this cluster
# 这样就造成后面的证书没有办法续期了,目前看官方没有提供跳过指定证书续期的方法。
# 根据官方的Issues说明,在Kubernetes v1.17版本中已经解决了此bug,但因为木子的Kubernetes集群是v1.15.x,所以此方法是不可行的,如果你的是v1.17的版本,刚好证书已经到期,可以试用以下方法(根据现在的时间计算,因为Kubernetes v1.17版本还没有发布一年时间,所以正常应该是测试不了的@-@)
# 官方Issues说明:https://github.com/kubernetes/kubernetes/issues/86864

即然没有办法一次性续期所有证书,那我们就一个一个续期证书即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# 进入/etc/kubernetes管理目录
[root@k8smaster01 ~]# cd /etc/kubernetes/

# 续订kubeconfig文件中嵌入的证书,供管理员和kubeadm自身使用。
[root@k8smaster01 kubernetes]# kubeadm alpha certs renew admin.conf
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed

# 续订apiserver用于连接kubelet的证书。
[root@k8smaster01 kubernetes]# kubeadm alpha certs renew apiserver-kubelet-client --config /root/config.yaml
certificate for the API server to connect to kubelet renewed

# 续订用于提供Kubernetes API的证书。
[root@k8smaster01 kubernetes]# kubeadm alpha certs renew apiserver --config /root/config.yaml
certificate for serving the Kubernetes API renewed

# 续订kubeconfig文件中嵌入的证书,以供控制器管理器(controller manager)使用。
[root@k8smaster01 kubernetes]# kubeadm alpha certs renew controller-manager.conf --config /root/config.yaml
certificate embedded in the kubeconfig file for the controller manager to use renewed

# 为前端代理客户端续订证书。
[root@k8smaster01 kubernetes]# kubeadm alpha certs renew front-proxy-client --config /root/config.yaml
certificate for the front proxy client renewed

# 续订kubeconfig文件中嵌入的证书,以供调度管理器使用。
[root@k8smaster01 kubernetes]# kubeadm alpha certs renew scheduler.conf --config /root/config.yaml
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

# 复制重新生成的admin.conf至$HOME/.kube/config
[root@k8smaster01 kubernetes]# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp:是否覆盖"/root/.kube/config"? y

# 设置权限
[root@k8smaster01 kubernetes]# chown $(id -u):$(id -g) $HOME/.kube/config

# 复制续期证书至其它控制节点
[root@k8smaster01 kubernetes]# scp /etc/kubernetes/pki/* k8smaster02:/etc/kubernetes/pki/
[root@k8smaster01 kubernetes]# scp /etc/kubernetes/pki/* k8smaster01:/etc/kubernetes/pki/

# 复制config至其它控制节点
[root@k8smaster01 kubernetes]# scp $HOME/.kube/config k8smaster02:/root/.kube/config
[root@k8smaster01 kubernetes]# scp $HOME/.kube/config k8smaster01:/root/.kube/config

验证证书到期时间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 查看所有证书到期日期时间,都是至2021年到期。
[root@k8smaster01 kubernetes]# kubeadm alpha certs check-expiration --config /root/config.yaml
CERTIFICATE EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
admin.conf Apr 13, 2021 08:49 UTC 364d no
apiserver Apr 13, 2021 08:50 UTC 364d no
apiserver-kubelet-client Apr 13, 2021 08:50 UTC 364d no
controller-manager.conf Apr 13, 2021 08:51 UTC 364d no
front-proxy-client Apr 13, 2021 08:51 UTC 364d no
scheduler.conf Apr 13, 2021 08:51 UTC 364d no

# apiserver证书有效期已经从2019年至2020年更新到2019年至2021年,续期一年成功。
[root@k8smaster01 kubernetes]# openssl x509 -in /etc/kubernetes/pki/front-proxy-client.crt -noout -text |grep ' Not '
Not Before: Apr 13 12:32:11 2019 GMT
Not After : Apr 13 08:51:09 2021 GMT
# apiserver-kubelet-client证书
[root@k8smaster01 kubernetes]# openssl x509 -in /etc/kubernetes/pki/apiserver-kubelet-client.crt -noout -text |grep ' Not '
Not Before: Apr 13 12:32:11 2019 GMT
Not After : Apr 13 08:50:27 2021 GMT
# front-proxy-client证书
[root@k8smaster01 kubernetes]# openssl x509 -in /etc/kubernetes/pki/front-proxy-client.crt -noout -text |grep ' Not '
Not Before: Apr 13 12:32:11 2019 GMT
Not After : Apr 13 08:51:09 2021 GMT
# 这里我们可以看到ca证书都是10年
[root@k8smaster01 kubernetes]# openssl x509 -in /etc/kubernetes/pki/ca.crt -noout -text |grep ' Not '
Not Before: Apr 13 12:32:11 2019 GMT
Not After : Apr 10 12:32:11 2029 GMT
[root@k8smaster01 kubernetes]# openssl x509 -in /etc/kubernetes/pki/front-proxy-ca.crt -noout -text |grep ' Not '
Not Before: Apr 13 12:32:11 2019 GMT
Not After : Apr 10 12:32:11 2029 GMT

后续操作

正常更新证书以后,官方建议是重启管理节点,然后即可恢复正常,当然这也是最简单的方法。但还有一个个人骚操作,因为这时候API是调用不了的,所以kubectl命令我们是用不了的,我们可以命令docker命令解决此问题。因为我们更新的是apiserver、controller-manager、scheduler服务的证书,所以我们重启对应docker即可,详细操作如下:

1
2
3
4
# 登录各管理节点,进行以下操作
docker restart `docker ps | grep apiserver | awk '{print $1}'`
docker restart `docker ps | grep controller-manager | awk '{print $1}'`
docker restart `docker ps | grep scheduler | awk '{print $1}'`

另外因为kubelet客户端证书也已经更新,所以建议重启每台管理节点上的kubelet服务。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
[root@k8smaster01 kubernetes]# systemctl restart kubelet
# 在证书失效的情况下,我们查看对应kubelet会发现以下错误:
[root@k8smaster01 kubernetes]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 一 2020-04-13 16:21:54 CST; 1h 53min ago
Docs: https://kubernetes.io/docs/
Main PID: 5160 (kubelet)
Tasks: 29
Memory: 115.8M
CGroup: /system.slice/kubelet.service
└─5160 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/ku...

4月 13 18:06:16 k8smaster01 kubelet[5160]: E0413 18:06:16.010129 5160 server.go:249] Unable to authenticate the reques... valid
4月 13 18:07:16 k8smaster01 kubelet[5160]: E0413 18:07:16.011214 5160 server.go:249] Unable to authenticate the reques... valid
4月 13 18:08:16 k8smaster01 kubelet[5160]: E0413 18:08:16.030449 5160 server.go:249] Unable to authenticate the reques... valid
4月 13 18:09:16 k8smaster01 kubelet[5160]: E0413 18:09:16.010087 5160 server.go:249] Unable to authenticate the reques... valid
4月 13 18:10:16 k8smaster01 kubelet[5160]: E0413 18:10:16.461004 5160 server.go:249] Unable to authenticate the reques... valid
4月 13 18:11:16 k8smaster01 kubelet[5160]: E0413 18:11:16.016195 5160 server.go:249] Unable to authenticate the reques... valid
4月 13 18:12:16 k8smaster01 kubelet[5160]: E0413 18:12:16.010188 5160 server.go:249] Unable to authenticate the reques... valid
4月 13 18:13:16 k8smaster01 kubelet[5160]: E0413 18:13:16.011561 5160 server.go:249] Unable to authenticate the reques... valid
4月 13 18:14:16 k8smaster01 kubelet[5160]: E0413 18:14:16.010086 5160 server.go:249] Unable to authenticate the reques... valid
4月 13 18:15:16 k8smaster01 kubelet[5160]: E0413 18:15:16.011657 5160 server.go:249] Unable to authenticate the reques... valid
Hint: Some lines were ellipsized, use -l to show in full.

# 重启kubelet以后,认证不可用就没有了
[root@k8smaster01 kubernetes]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 一 2020-04-13 18:16:18 CST; 2s ago
Docs: https://kubernetes.io/docs/
Main PID: 15995 (kubelet)
Tasks: 18
Memory: 26.0M
CGroup: /system.slice/kubelet.service
└─15995 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/k...

4月 13 18:16:18 k8smaster01 kubelet[15995]: I0413 18:16:18.471931 15995 remote_image.go:50] parsed scheme: ""
4月 13 18:16:18 k8smaster01 kubelet[15995]: I0413 18:16:18.471948 15995 remote_image.go:50] scheme "" not registered, f...scheme
4月 13 18:16:18 k8smaster01 kubelet[15995]: I0413 18:16:18.472319 15995 asm_amd64.s:1337] ccResolverWrapper: sending ne...nil>}]
4月 13 18:16:18 k8smaster01 kubelet[15995]: I0413 18:16:18.472383 15995 clientconn.go:796] ClientConn switching balance...first"
4月 13 18:16:18 k8smaster01 kubelet[15995]: I0413 18:16:18.472511 15995 balancer_conn_wrappers.go:131] pickfirstBalance...ECTING
4月 13 18:16:18 k8smaster01 kubelet[15995]: I0413 18:16:18.472746 15995 asm_amd64.s:1337] ccResolverWrapper: sending ne...nil>}]
4月 13 18:16:18 k8smaster01 kubelet[15995]: I0413 18:16:18.472779 15995 clientconn.go:796] ClientConn switching balance...first"
4月 13 18:16:18 k8smaster01 kubelet[15995]: I0413 18:16:18.472832 15995 balancer_conn_wrappers.go:131] pickfirstBalance...ECTING
4月 13 18:16:18 k8smaster01 kubelet[15995]: I0413 18:16:18.472974 15995 balancer_conn_wrappers.go:131] pickfirstBalance... READY
4月 13 18:16:18 k8smaster01 kubelet[15995]: I0413 18:16:18.472988 15995 balancer_conn_wrappers.go:131] pickfirstBalance... READY
Hint: Some lines were ellipsized, use -l to show in full.

验证API是否正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#确保通过kubectl命令可能正常获取数据,并确保所有kube-system命名空间的服务都正常。
[root@k8smaster01 kubernetes]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-68b57596f6-gwssn 1/1 Running 22 137m
coredns-68b57596f6-pt7hv 1/1 Running 0 80m
kube-apiserver-k8smaster01 1/1 Running 6 107m
kube-apiserver-k8smaster02 1/1 Running 6 107m
kube-apiserver-k8smaster01 1/1 Running 6 107m
kube-controller-manager-k8smaster01 1/1 Running 6 107m
kube-controller-manager-k8smaster02 1/1 Running 9 82m
kube-controller-manager-k8smaster01 1/1 Running 10 106m
kube-proxy-8zqcm 1/1 Running 0 48d
kube-proxy-b284l 1/1 Running 4 235d
kube-proxy-mqj76 1/1 Running 4 235d
kube-proxy-pxzhm 1/1 Running 4 235d
kube-proxy-t8nlb 1/1 Running 4 235d
kube-proxy-v4qdd 1/1 Running 6 235d
kube-router-24fqp 1/1 Running 0 48d
kube-router-cbmmn 1/1 Running 0 48d
kube-router-cw5m9 1/1 Running 5 212d
kube-router-mqwg7 1/1 Running 1 48d
kube-router-ql9b7 1/1 Running 17 364d
kube-router-tmfgr 1/1 Running 1 48d
kube-scheduler-k8smaster01 1/1 Running 8 106m
kube-scheduler-k8smaster02 1/1 Running 8 106m
kube-scheduler-k8smaster01 1/1 Running 5 106m
metrics-server-6ffc898ffb-8kzqk 1/1 Running 5 212d
metrics-server-6ffc898ffb-h79td 1/1 Running 0 48d
tiller-deploy-d4d7b9495-pccdb 1/1 Running 0 80m
traefik-ingress-lb-9vld9 1/1 Running 10 350d
traefik-ingress-lb-r9d85 1/1 Running 7 350d
traefik-ingress-lb-ztv7r 1/1 Running 7 350d

参考文档

https://kubernetes.io/zh/docs/reference/setup-tools/kubeadm/kubeadm-alpha/
话说Kuernetes v1.18已经支持证书自动轮换更新了,详细参考:
https://kubernetes.io/zh/docs/tasks/tls/certificate-rotation/

坚持原创技术分享,您的支持与鼓励,是我持续创作的动力!