k8s集群部署(持续更新)

组件架构及版本

k8s: 1.20.5
~~glusterfs : 6.0~~
calico: 3.8
docker-ce: 19.03
containerd: 1.4.9

系统准备

centos7 升级到最新内核5.4+

三节点: k8s-node1-3

/etc/hosts做好主机名解析

节点	部署组件
k8s-node1	glusterFS/k8s-master
k8s-node2	glusterFS/k8s-worker
k8s-node3	heketi/k8s-worker

docker部署

tbd

1
2
3

# daemon.json修改cgroup driver不生效,还有一个地方需要修改
vim /etc/systemd/system/multi-user.target.wants/docker.service
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd

部署gluterFS集群

测试环境双节点glusterfs;heketi提供resetful API给k8s

生产建议至少三节点glusterfs,避免脑裂

gluster+heketi直接使用块存储设备,即无需分区无需格式化直接使用(类似/dev/sdb)

相比ceph,部署简单好多,ceph是对象存储–>块存储,glusterFS是块存储–>对象存储

安装glusterfs

yum -y install centos-release-gluster
yum -y install glusterfs glusterfs-fuse glusterfs-server
systemctl enable glusterd --now
systemctl status glusterd

安装heketi

1 2	`yum install -y centos-release-gluster yum install -y heketi heketi-client`

配置

/etc/heketi/heketi.json

“#”号注释需去掉,密码自定义,最好备份一下原模版配置文件

{
#默认端口8080
  "_port_comment": "Heketi Server Port Number",
  "port": "8080",
#默认值false，不需要认证
  "_use_auth": "Enable JWT authorization. Please enable for deployment",
  "use_auth": true,
#设置密码
  "_jwt": "Private keys for access",
  "jwt": {
    "_admin": "Admin has access to all APIs",
    "admin": {
      "key": "admin@123"
    },
    "_user": "User only has access to /volumes endpoint",
    "user": {
      "key": "user@123"
    }
  },
 
  "_glusterfs_comment": "GlusterFS Configuration",
  "glusterfs": {
    "_executor_comment": [
      "Execute plugin. Possible choices: mock, ssh",
      "mock: This setting is used for testing and development.",
      "      It will not send commands to any node.",
      "ssh:  This setting will notify Heketi to ssh to the nodes.",
      "      It will need the values in sshexec to be configured.",
      "kubernetes: Communicate with GlusterFS containers over",
      "            Kubernetes exec api."
    ],
    #使用ssh
    "executor": "ssh",
 
    "_sshexec_comment": "SSH username and private key file information",
    "sshexec": {
      "keyfile": "/etc/heketi/heketi_key",
      "user": "root",
      "port": "22",
      "fstab": "/etc/fstab"
    },
 
    "_kubeexec_comment": "Kubernetes configuration",
    "kubeexec": {
      "host" :"https://kubernetes.host:8443",
      "cert" : "/path/to/crt.file",
      "insecure": false,
      "user": "kubernetes username",
      "password": "password for kubernetes user",
      "namespace": "OpenShift project or Kubernetes namespace",
      "fstab": "Optional: Specify fstab file on node.  Default is /etc/fstab"
    },
 
    "_db_comment": "Database file name",
    "db": "/var/lib/heketi/heketi.db",
 
    "_loglevel_comment": [
      "Set log level. Choices are:",
      " none, critical, error, warning, info, debug",
      "Default is warning"
    ],
#默认日志输出debug
#日志信息输出在/var/log/message
    "loglevel" : "warning"
  }
}

生成key

ssh-keygen -t rsa -q -f /etc/heketi/heketi_key -N ""
chown heketi:heketi /etc/heketi/heketi_key
ssh-copy-id -i /etc/heketi/heketi_key.pub root@k8s-node1
ssh-copy-id -i /etc/heketi/heketi_key.pub root@k8s-node2

启动

1
2
3

#启动前确保/usr/lib/systemd/system/heketi.service 包含配置--config=/etc/heketi/heketi.json
systemctl enable heketi --now
systemctl status heketi

拓扑配置

/etc/heketi/topology.json

# 通过topology.json文件定义组建GlusterFS集群；
# topology指定了层级关系：clusters-->nodes-->node/devices-->hostnames/zone；
# node/hostnames字段的manage填写主机ip，指管理通道，在heketi服务器不能通过hostname访问GlusterFS节点时不能填写hostname；
# node/hostnames字段的storage填写主机ip，指存储数据通道，与manage可以不一样；
# node/zone字段指定了node所处的故障域，heketi通过跨故障域创建副本，提高数据高可用性质，如可以通过rack的不同区分zone值，创建跨机架的故障域；
# devices字段指定GlusterFS各节点的盘符（可以是多块盘','分割），必须是未创建文件系统的裸设备
#本次两台glusterfs主机添加硬盘/dev/sdb
{
  "clusters": [
    {
      "nodes": [
        {
          "node": {
            "hostnames": {
              "manage": [
                "192.168.1.61"
              ],
              "storage": [
                "192.168.1.61"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/sdb"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "192.168.1.62"
              ],
              "storage": [
                "192.168.1.62"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/sdb"
          ]
        }
      ]
    }
  ]
}

激活拓扑配置及常用命令

heketi-cli --server http://localhost:8080 --user admin --secret admin@123 topology load --json=/etc/heketi/topology.json
#查看集群信息
heketi-cli --user admin --secret admin@123 topology info --server http://localhost:8080
#集群列表
heketi-cli --user admin --secret admin@123 cluster list
#查看集群信息 需要加集群cluster ID
heketi-cli --user admin --secret admin@123 cluster info ca501b26cc0cfdf391dfe1d7fc7ad242
#查看node列表
heketi-cli --user admin --secret admin@123 node list
#查看node详细信息info 需要加node id
heketi-cli --user admin --secret admin@123 node info 4f58f2c0691b92014c6d3b83c390c0a2
#查看磁盘详细信息info需要加磁盘ID
heketi-cli --user admin --secret admin@123 device info deeb6efc0e8821ba4a443e683dfdf041

部署miniIO

TBD

部署k8s

关闭selinux

安装依赖或工具

1	`yum install -y conntrack ntpdate ntp ipvsadm ipset jq iptables curl sysstat libseccomp net-tools`

防火墙及端口

查看官方说明或者直接关闭

前置参数配置

vim /etc/hosts
# 将集群ip和主机名都加进去
swapoff -a
vim /etc/fstab
#注释掉swap

内核模块参数配置

内核参数

/etc/sysctl.d/k8s.conf

net.bridge.bridge-nf-call-iptables=1 # 节点上的iptables能够正确地查看桥接流量
net.bridge.bridge-nf-call-ip6tables=1   # 节点上的iptables能够正确地查看桥接流量
net.ipv4.ip_forward=1
vm.swappiness=0         # 禁止使用 swap 空间，只有当系统 OOM 时才允许使用它
vm.overcommit_memory=1  # 不检查物理内存是否够用
vm.panic_on_oom=0       # 开启 OOM
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576
fs.file-max=52706963
fs.nr_open=52706963
net.ipv6.conf.all.disable_ipv6=1
net.netfilter.nf_conntrack_max=2310720

1 2	`#生效 sysctl -p /etc/sysctl.d/k8s.conf`

ipvs必备模块

/etc/sysconfig/modules/ipvs.modules

#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4

1
2
3

#生效验证
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
#提示modprobe: FATAL: Module nf_conntrack_ipv4 not found.为正常现象

调整时区

# 设置系统时区为中国/上海
timedatectl set-timezone Asia/Shanghai

# 将当前的 UTC 时间写入硬件时钟
timedatectl set-local-rtc 0

# 重启依赖于系统时间的服务
systemctl restart rsyslog
systemctl restart crond

安装容器运行时

docker18+后自带containerd,如果不确定也可以手动装一下

yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
sed -i 's+download.docker.com+mirrors.aliyun.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo
yum -y install docker-ce-19.03.15-3.el7
systemctl enable docker --now
systemctl enable containerd --now

配置containerd

# 生成默认配置
containerd config default > /etc/containerd/config.toml
# 开启cgoup driver 为 systemd,修改镜像源
sed -i "s#k8s.gcr.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g"              /etc/containerd/config.toml
sed -i "s#SystemdCgroup = false#SystemdCgroup = true#g" /etc/containerd/config.toml
sed -i "s#https://registry-1.docker.io#https://registry.cn-hangzhou.aliyuncs.com#g"      /etc/containerd/config.toml
#注意sanbox image也需要改，不然会下载失败，可以通过kubeadm config images ls --config kubeadm.yml 获取镜像版本列表
sed -i "s#registry.k8s.io/pause:3.6#registry.aliyuncs.com/google_containers/pause:3.2#g" /etc/containerd/config.toml
# 重启
systemctl daemon-reload
systemctl restart containerd

安装k8s组件

cat << EOF > /etc/yum.repos.d/kubernetes.repo 
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
yum makecache fast
yum install -y kubeadm-1.20.5 kubectl-1.20.5 kubelet-1.20.5
systemctl enable kubelet --now
#kubelet暂时还启动不了 是正常现象

设置k8s运行时为containerd

安装crictl
containerd的命令行客户端工具

1
2
3

wget https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.19.0/crictl-v1.19.0-linux-amd64.tar.gz
tar -xzvf crictl-v1.19.0-linux-amd64.tar.gz 
mv crictl /usr/local/bin/

修改crictl配置文件

vim /etc/crictl.yaml
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: true

检查containerd配置文件

1 2	`# 如果有下面的配置要注释掉 # disabled_plugins = ["cri"]`

修改kubelet启动参数

1 2	`vim /etc/sysconfig/kubelet KUBELET_EXTRA_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock"`

crictl基本命令

使用containerd作为容器运行时后，docker命令就基本用不上，查看镜像列表和进程都要用以下命令

1 2	`crictl images crictl ps`

使用kubeadm初始化k8s

生成默认配置并按需修改

kubeadm config print init-defaults > kubeadm.yaml

# 常用修改项
advertiseAddress: 192.168.1.111 #api server ip,双网卡一般用内网ip,供k8s各组件访问
criSocket: /run/containerd/containerd.sock #使用containerd
imageRepository: registry.aliyuncs.com/google_containers 
kubernetesVersion: v1.20.5 
# 在 dnsDomain: cluster.local 添加，与 dnsDomain 平级
  podSubnet: 172.16.0.0/16 #pod网段,不配置的话默认为10.244.0.0/24,后面部署calico需要用到
# 在最后添加
---
# 使用ipvs
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs 
---
# cgroup driver使用systemd
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd

初始化集群

1	`kubeadm init --config=kubeadm.yaml`

初始化后操作

1
2
3

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

获取节点添加命令

1	`kubeadm token create --print-join-command`

将上一步输出的命令在worker节点上执行
检查集群状态

# 检查kubelet状态以及新加的启动参数是否生效
systemctl status kubelet
ps -ef | grep kubelet 
# 检查k8s集群,container runtime一列显示正在使用的运行时
kubectl get nodes -o wide
# 检查组件状态
kubectl get cs
# 一般scheduler和controller-manager会显示会unhealthy 但不影响使用 修正方法如下
vim /etc/kubernetes/manifests/kube-scheduler.yaml
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
# 将上面两个文档中 port=0 这个环境变量注释掉 等待一段时间即可恢复

部署calico

wget https://docs.projectcalico.org/v3.8/manifests/calico.yaml
# 修改配置项
- name: calico-node
          image: calico/node:v3.8.9
          env:
            # Use Kubernetes API as the backing datastore.
            - name: DATASTORE_TYPE
              value: "kubernetes"
              # 增加下面变量和值，与自己的系统匹配,一般写内网网卡
            - name: IP_AUTODETECTION_METHOD              
              value: "interface=eth0"
              # 找到 CALICO_IPV4POOL_CIDR 修改成上面定义的pod_subnet
            - name: CALICO_IPV4POOL_CIDR
              value: "172.16.0.0/16"
# 部署
kubectl apply -f calico.yaml

等待节点ready及calico pod启动完成

kubectl get nodes
kubectl get pods -n kube-system -w
# calico部署并启动完成后 coredns 才会正常running
# 偶尔calico某个节点长时间都启动不了可以尝试delete再重新apply
# kubectl delete -f calico.yaml
# kubectl apply -f calico.yaml

测试dns

1
2
3

kubectl run -it --rm dns-test --image=busybox sh
# 容器内执行下面命令，查看能否正常解析出 kubernetes
nslookup kubernetes

kubectl命令自动补全

yum install bash-completion -y
echo 'source /usr/share/bash-completion/bash_completion' >> ~/.bashrc
echo 'source <(kubectl completion bash)' >> ~/.bashrc
source ~/.bashrc

k8s使用存储类动态挂载glusterFS

创建存储类

使用yaml文件创建存储类

# provisioner：表示存储分配器，需要根据后端存储的不同而变更；
# reclaimPolicy: 默认即”Delete”，删除pvc后，相应的pv及后端的volume，brick(lvm)等一起删除；设置为”Retain”时则保留数据，需要手工处理
# resturl：heketi API服务提供的url；
# restauthenabled：可选参数，默认值为”false”，heketi服务开启认证时必须设置为”true”；
# restuser：可选参数，开启认证时设置相应用户名；
# secretNamespace：可选参数，开启认证时可以设置为使用持久化存储的namespace；
# secretName：可选参数，开启认证时，需要将heketi服务的认证密码保存在secret资源中；
# clusterid：可选参数，指定集群id，也可以是1个clusterid列表，格式为”id1,id2”；
# volumetype：可选参数，设置卷类型及其参数，如果未分配卷类型，则有分配器决定卷类型；如”volumetype: replicate:3”表示3副本的replicate卷，”volumetype: disperse:4:2”表示disperse卷，其中‘4’是数据，’2’是冗余校验，”volumetype: none”表示distribute卷#
# cat gluster-heketi-storageclass.yaml
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: gluster-heketi-storageclass
provisioner: kubernetes.io/glusterfs
reclaimPolicy: Delete
parameters:
  resturl: "http://192.168.1.61:8080" # heketi ip及端口
  restauthenabled: "true"
  restuser: "admin"
  secretNamespace: "default"
  secretName: "heketi-secret"
  #restuserkey: "xiaotech"
  volumetype: "replicate:2" #默认为副本模式,本文档为双节点所以副本数为2,生产建议至少3节点
 
# 生成secret资源，其中”key”值需要转换为base64编码格式
 echo -n "admin@123"|base64
 
# 注意name/namespace与storageclass资源中定义一致；
# 密码必须有“kubernetes.io/glusterfs” type
# cat heketi-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: heketi-secret
  namespace: default
data:
  # base64 encoded password. E.g.: echo -n "mypassword" | base64
  key: YWRtaW5AMTIz
type: kubernetes.io/glusterfs

创建

kubectl apply -f heketi-secret.yaml
# 注意：storageclass资源创建后不可变更，如修改只能删除后重建
kubectl apply -f gluster-heketi-storageclass.yaml
# 验证
kubectl describe storageclass gluster-heketi-storageclass

使用

# 定义pvc
# cat gluster-heketi-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: gluster-heketi-pvc
  namespace: default
  #annotations:
  #  volume.beta.kubernetes.io/storage-class: "glusterfs"
spec:
  #与storageclass名字对应
  storageClassName: gluster-heketi-storageclass
  # ReadWriteOnce：简写RWO，读写权限，且只能被单个node挂载；
  # ReadOnlyMany：简写ROX，只读权限，允许被多个node挂载；
  # ReadWriteMany：简写RWX，读写权限，允许被多个node挂载；
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 2Gi
 
#创建pvc资源
# kubectl create -f gluster-heketi-pvc.yaml
persistentvolumeclaim/gluster-heketi-pvc created

# 查看
kubectl describe pvc gluster-heketi-pvc

pvc创建后,k8s会自动生成对应的pv,可以根据上面查到的pv id(也可以通过get获取)查看

1 2	`kubectl get pv kubectl describe pv pvc-345664d6-052f-4b4b-bf6f-24df1e582fb4`

在glusterFS中,事实上是在两个节点都分别创立一个lvm并挂载,可以通过heketi查看

1	`heketi-cli --user admin --secret admin@123 topology info --server http://localhost:8080`

也可以进到两个节点中直接lvdisplay和查看/etc/fstab

pod使用pvc示例

# cat gluster-heketi-pod.yaml
kind: Pod
apiVersion: v1
metadata:
  name: gluster-heketi-pod
spec:
  containers:
  - name: gluster-heketi-container
    image: busybox
    command:
    - sleep
    - "3600"
    volumeMounts:
    - name: gluster-heketi-volume
      mountPath: "/pv-data"
      readOnly: false
  volumes:
  - name: gluster-heketi-volume
    persistentVolumeClaim:
      claimName: gluster-heketi-pvc
 
#创建Pod
#kubectl apply -f gluster-heketi-pod.yaml

备注

heketi服务停止不会影响pvc正常使用,但会影响下一次pvc的创建

如果在heketi停止期间通过kubectl删除了pvc则不会自动删除gluster volume 逻辑卷LVM磁盘挂载等信息

calico增强功能

固定ip

检查k8s每个节点的calico配置文件是否支持

cat /etc/cni/net.d/10-calico.conflist

1
2
3

"ipam": {
     "type": "calico-ipam"
 },

看到calico有使用ipam插件,即可使用固定ip功能
如果没有ipam或者该配置文件不存在,则不能使用

使用示例

在pod.metadata.annotations中添加**”cni.projectcalico.org/ipAddrs”: “["192.168.0.1"]”**即可
例如

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
      annotations:
        "cni.projectcalico.org/ipAddrs": "[\"10.244.36.98\"]"
......

注意: 虽然ip地址是列表形式,但目前仅支持一个pod一个ip这样配置,所以replica不能大于1

浮动ip

修改calico的configmap

1	`kubectl edit cm -n kube-system calico-config`

增加feature_control配置

cni_network_config: |-
    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.0",
      "plugins": [
        {
          "type": "calico",
          "log_level": "info",
          "datastore_type": "kubernetes",
          "nodename": "__KUBERNETES_NODE_NAME__",
          "mtu": __CNI_MTU__,
          "ipam": {
              "type": "calico-ipam"
          },
          "policy": {
              "type": "k8s"
          },
          "kubernetes": {
              "kubeconfig": "__KUBECONFIG_FILEPATH__"
          },
          # 开启浮动ip功能
          "feature_control": {
              "floating_ips": true
          }
        },
        {
          "type": "portmap",
          "snat": true,
          "capabilities": {"portMappings": true}
        }
      ]
    }