[AEWS] 7주차 - EKS Mode/Nodes

카테고리 없음

[AEWS] 7주차 - EKS Mode/Nodes

summary-aws 2025. 3. 20. 21:49

1. K8S Scheduler란?

2. 실습환경 구성을 위한 Terraform 설치

3. Fargate란?

- Amazon EKS Blueprints for Terraform

4. Auto-mode 란?
- sample-aws-eks-auto-mode

K8S Scheduler

Scheduler(스케줄러)는 쿠버네티스 클러스터 내에서 생성된 파드를 적절한 노드에 배치하는 컴포넌트

역할:
파드(Pod)를 적절한 노드(Node)에 배치해 클러스터의 리소스(CPU, 메모리 등)를 효율적으로 사용하도록 합니다.

동작 방식:
파드를 배치할 노드를 선택하기 위해 여러 단계를 거쳐 결정합니다.
파드의 요구사항과 노드의 상태, 제약 조건을 고려합니다.

Scheduler의 동작은 크게 2단계로 구분
① Filtering (필터링)
파드가 실행될 수 있는 후보 노드를 선별합니다.
파드의 요청 리소스(Requests), 볼륨 타입, 레이블/어피니티 등을 고려하여 노드를 필터링합니다.

주요 필터링 조건 예시:
노드에 충분한 리소스가 있는가? (CPU, 메모리 등)
노드가 파드의 affinity(친화성)/anti-affinity(반친화성)를 만족하는가?
taints(테인트)와 tolerations(톨러레이션)을 만족하는가?
노드가 필요한 볼륨 유형을 지원하는가?

② Scoring (점수화)
필터링을 통과한 후보 노드에 대해 각 노드별로 우선순위 점수를 매깁니다.
가장 높은 점수를 가진 노드에 파드를 배치합니다.

주요 점수화 기준 예시:
리소스의 균형(Balanced resource usage)
노드의 현재 로드 상태
노드 affinity 및 anti-affinity 조건 만족도
노드의 이미지 로컬리티 (이미지 미리 존재 여부)

개념	설명
Affinity	파드가 특정 노드를 선호하게 하여 파드를 함께 배치하거나 특정 노드에 배치	동일 애플리케이션 파드를 함께 배치
Anti-Affinity	파드가 특정 파드나 노드를 회피하여 배치되게 함	HA를 위해 파드를 서로 다른 노드에 분산 배치
Taints (테인트)	노드가 특정 파드를 회피하도록 설정한 조건	GPU 노드는 GPU 사용 파드만 사용하도록 설정
Tolerations (톨러레이션)	파드가 특정 테인트가 설정된 노드에 배치될 수 있도록 허용	GPU를 요구하는 파드가 GPU 노드에 설정된 테인트를 허용

Scheduler가 고려하는 리소스와 요소

CPU 및 메모리 리소스 요청량(Requests)
디스크 볼륨 타입 지원 여부
NodeSelector, 노드 레이블(Label)
노드의 현재 리소스 사용량과 상태
파드 Affinity, Anti-Affinity 조건
Taints & Tolerations 조건

실습환경 구성을 위한 Terraform 설치
- Terraform은 IaC 도구

윈도우 환경

wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform

# 테라폼 버전 정보 확인
terraform version

# 서브커맨드 help 지원
terraform console -help
terraform init -help

terraform init - 초기화

terraform plan - 생성되는 자원 확인

terraform apply - 생성

terraform state list - 자원정보

terraform destroy - 삭제

terraform fmt - 들여쓰기 확인

Fargate

- EKS(컨트롤 플레인) + Fargate(데이터 플레인)의 완전한 서버리스화(=AWS 관리형)

서비스 구성도

특징

Node 관리	불필요 (AWS에서 관리)
자동 스케일링	가능 (파드 기반)
컨테이너 격리	파드별 독립된 환경
볼륨 타입 제한	EFS, EmptyDir 가능
비용 청구 방식	사용된 CPU/메모리 기준
최대 자원	CPU:16 vCPU, Mem:120 GiB
DaemonSet 지원	미지원

Fargate 요금은 실행된 컨테이너의 vCPU와 메모리 자원을 기준으로 과금

사용한 리소스(CPU, 메모리) 및 실행 시간따라 과금

리소스요금(서울리전 시간당)

vCPU	$0.04956
메모리	$0.00544/GB

계층기술/구성요소	관리
컨테이너 런타임	컨테이너(Docker/containerd)	AWS
가상화(Virtualization)	Firecracker MicroVM	AWS
하이퍼바이저(Hypervisor)	AWS Nitro 시스템	AWS
물리 서버	AWS 전용 하드웨어 (EC2 Nitro 기반)	AWS

Amazon EKS Blueprints for Terraform

Amazon EKS Blueprints for Terraform은 Amazon Elastic Kubernetes Service(EKS)의 설정과 운영을 간소화하고 자동화하기 위해 설계된 오픈소스 프레임워크입니다. 사용자가 손쉽게 표준화된 모범 사례를 따라 Kubernetes 클러스터를 배포하고 관리할 수 있도록 Terraform 모듈과 설정 예시를 제공

표준화된 Kubernetes 환경
모범 사례와 검증된 패턴을 미리 구성한 Terraform 모듈을 제공하여 일관된 환경 구성이 가능
모듈식 아키텍처
네트워크, 보안, 로깅, 모니터링, CI/CD, GitOps 등 필수 구성요소를 독립적인 모듈 형태로 제공하여 필요에 따라 자유롭게 결합
확장성 및 사용자 정의
사전 구성된 애드온(add-on) 외에도 사용자가 필요에 따라 추가적인 오픈소스 도구(예: ArgoCD, Prometheus, Grafana, Fluent Bit)를 손쉽게 추가하거나 수정하여 사용
보안 및 거버넌스 강화
AWS IAM 역할과 정책을 명확히 설정하여 클러스터 액세스 제어와 RBAC(Role-Based Access Control)을 효과적으로 관리
GitOps 및 CI/CD 통합 지원
ArgoCD와 같은 GitOps 도구 및 CI/CD 파이프라인 도구와의 통합을 통해 코드 기반의 지속적인 배포 및 관리를 구현

Amazon EKS Blueprints for Terraform 으로 EKS, Fargate Profile 배포

# 코드 가져오기
git clone https://github.com/aws-ia/terraform-aws-eks-blueprints
tree terraform-aws-eks-blueprints/patterns
cd terraform-aws-eks-blueprints/patterns/fargate-serverless

# init 초기화
terraform init
tree .terraform
cat .terraform/modules/modules.json | jq
tree .terraform/providers/registry.terraform.io/hashicorp -L 2

# plan
terraform plan

# 배포 : EKS, Add-ons, fargate profile - 13분 소요
terraform apply -auto-approve

# 배포 완료 후 확인
terraform state list

# EKS 자격증명
$(terraform output -raw configure_kubectl) # aws eks --region ap-northeast-2 update-kubeconfig --name fargate-serverless
cat ~/.kube/config

# kubectl context 변경
kubectl ctx
kubectl config rename-context "arn:aws:eks:ap-northeast-2:$(aws sts get-caller-identity --query 'Account' --output text):cluster/fargate-serverless" "fargate-lab"

# k8s 노드, 파드 정보 확인
kubectl ns default
kubectl cluster-info
kubectl get node
kubectl get pod -A

# 상세 정보 확인
terraform show

실습 기본정보

# k8s api service 확인 : ENDPOINTS 의 IP는 EKS Owned-ENI 2개
root@DESKTOP-1BA59FT:~/terraform-aws-eks-blueprints/patterns/fargate-serverless# kubectl get svc,ep
NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   172.20.0.1   <none>        443/TCP   17m

NAME                   ENDPOINTS                        AGE
endpoints/kubernetes   10.0.18.202:443,10.0.35.56:443   17m

# node 확인 : 노드(Micro VM) 4대 ( v1.30.8-eks-2d5f260 )
kubectl get csr
kubectl get node -owide
NAME                                                STATUS   ROLES    AGE     VERSION               INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
fargate-ip-10-0-24-240.us-west-2.compute.internal   Ready    <none>   8m47s   v1.30.8-eks-2d5f260   10.0.24.240   <none>        Amazon Linux 2   5.10.234-225.910.amzn2.x86_64   containerd://1.7.25
fargate-ip-10-0-34-95.us-west-2.compute.internal    Ready    <none>   8m52s   v1.30.8-eks-2d5f260   10.0.34.95    <none>        Amazon Linux 2   5.10.234-225.910.amzn2.x86_64   containerd://1.7.25
fargate-ip-10-0-37-91.us-west-2.compute.internal    Ready    <none>   9m9s    v1.30.8-eks-2d5f260   10.0.37.91    <none>        Amazon Linux 2   5.10.234-225.910.amzn2.x86_64   containerd://1.7.25
fargate-ip-10-0-4-32.us-west-2.compute.internal     Ready    <none>   9m13s   v1.30.8-eks-2d5f260   10.0.4.32     <none>        Amazon Linux 2   5.10.234-225.910.amzn2.x86_64   containerd://1.7.25

kubectl describe node | grep eks.amazonaws.com/compute-type
Labels:             eks.amazonaws.com/compute-type=fargate
Taints:             eks.amazonaws.com/compute-type=fargate:NoSchedule
...

# 파드 확인 : 파드의 IP와 노드의 IP가 같다!
kubectl get pdb -n kube-system
NAME                           MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
aws-load-balancer-controller   N/A             1                 1                     10m
coredns                        N/A             1                 1                     16m
kubectl get pod -A -owide
NAMESPACE     NAME                                            READY   STATUS    RESTARTS   AGE   IP             NODE                                                      NOMINATED NODE   READINESS GATES
app-2048      app-2048-6c45d649c8-2c2x4                       0/1     Pending   0          14m   <none>        <none>                                              <none>           <none>
app-2048      app-2048-6c45d649c8-84p4s                       0/1     Pending   0          14m   <none>        <none>                                              <none>           <none>
app-2048      app-2048-6c45d649c8-cvgw9                       0/1     Pending   0          14m   <none>        <none>                                              <none>           <none>
kube-system   aws-load-balancer-controller-7cc6cd8ddd-29lc5   1/1     Running   0          10m   10.0.24.240   fargate-ip-10-0-24-240.us-west-2.compute.internal   <none>           <none>
kube-system   aws-load-balancer-controller-7cc6cd8ddd-r57wj   1/1     Running   0          10m   10.0.34.95    fargate-ip-10-0-34-95.us-west-2.compute.internal    <none>           <none>
kube-system   coredns-69fd949db7-pjp9h                        1/1     Running   0          10m   10.0.37.91    fargate-ip-10-0-37-91.us-west-2.compute.internal    <none>           <none>
kube-system   coredns-69fd949db7-szr79                        1/1     Running   0          10m   10.0.4.32     fargate-ip-10-0-4-32.us-west-2.compute.internal     <none>           <none>

# aws-load-balancer-webhook-service , eks-extension-metrics-api?
kubectl get svc,ep -n kube-system
NAME                                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
service/aws-load-balancer-webhook-service   ClusterIP   172.20.72.191   <none>        443/TCP                  34m
service/eks-extension-metrics-api           ClusterIP   172.20.173.28   <none>        443/TCP                  42m

# eks-extension-metrics-api?
kubectl get apiservices.apiregistration.k8s.io | grep eks
v1.metrics.eks.amazonaws.com           kube-system/eks-extension-metrics-api   True        53m

kubectl get --raw "/apis/metrics.eks.amazonaws.com" | jq
kubectl get --raw "/apis/metrics.eks.amazonaws.com/v1" | jq

# configmap 확인
kubectl get cm -n kube-system
...

# aws-auth 보다 우선해서 IAM access entry 가 있음을 참고.
# 기본 관리노드 보다 system:node-proxier 그룹이 추가되어 있음.
# fargate profile 이 2개인데, 그 profile 갯수만큼 있음.
kubectl get cm -n kube-system aws-auth -o yaml
...
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      - system:node-proxier
      rolearn: arn:aws:iam::824766816795:role/kube-system-20250320115108272100000010
      username: system:node:{{SessionName}}
    - groups:
      - system:bootstrappers
      - system:nodes
      - system:node-proxier
      rolearn: arn:aws:iam::824766816795:role/app_wildcard-2025032011510827100000000f
      username: system:node:{{SessionName}}
kind: ConfigMap
    ...

#
kubectl rbac-tool lookup system:node-proxier
  SUBJECT             | SUBJECT TYPE | SCOPE       | NAMESPACE | ROLE                | BINDING
----------------------+--------------+-------------+-----------+---------------------+-------------------------
  system:node-proxier | Group        | ClusterRole |           | system:node-proxier | eks:kube-proxy-fargate

kubectl rolesum -k Group system:node-proxier
...
Policies:
• [CRB] */eks:kube-proxy-fargate ⟶  [CR] */system:node-proxier
  Resource                         Name  Exclude  Verbs  G L W C U P D DC
  endpoints                        [*]     [-]     [-]   ✖ ✔ ✔ ✖ ✖ ✖ ✖ ✖
  endpointslices.discovery.k8s.io  [*]     [-]     [-]   ✖ ✔ ✔ ✖ ✖ ✖ ✖ ✖
  events.[,events.k8s.io]          [*]     [-]     [-]   ✖ ✖ ✖ ✔ ✔ ✔ ✖ ✖
  nodes                            [*]     [-]     [-]   ✔ ✔ ✔ ✖ ✖ ✖ ✖ ✖
  services                         [*]     [-]     [-]   ✖ ✔ ✔ ✖ ✖ ✖ ✖ ✖

#
kubectl get cm -n kube-system amazon-vpc-cni -o yaml
apiVersion: v1
data:
  branch-eni-cooldown: "60"
  minimum-ip-target: "3"
  warm-ip-target: "1"
  warm-prefix-target: "0"
  ...

# coredns 설정 내용
kubectl get cm -n kube-system coredns -o yaml

# 인증서 작성되어 있음 : client-ca-file , requestheader-client-ca-file
kubectl get cm -n kube-system extension-apiserver-authentication -o yaml

#
kubectl get cm -n kube-system kube-proxy -o yaml
kubectl get cm -n kube-system kube-proxy-config -o yaml
apiVersion: v1
data:
  config: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    clientConnection:
      acceptContentTypes: ""
      burst: 10
      contentType: application/vnd.kubernetes.protobuf
      kubeconfig: /var/lib/kube-proxy/kubeconfig
      qps: 5
    clusterCIDR: ""
    configSyncPeriod: 15m0s
    conntrack:
      maxPerCore: 32768
      min: 131072
      tcpCloseWaitTimeout: 1h0m0s
      tcpEstablishedTimeout: 24h0m0s
    enableProfiling: false
    healthzBindAddress: 0.0.0.0:10256
    hostnameOverride: ""
    iptables:
      masqueradeAll: false
      masqueradeBit: 14
      minSyncPeriod: 0s
      syncPeriod: 30s
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: ""
      syncPeriod: 30s
    kind: KubeProxyConfiguration
    metricsBindAddress: 0.0.0.0:10249
    mode: "iptables"
    nodePortAddresses: null
    oomScoreAdj: -998
    portRange: ""

coredns 파드 상세 정보 확인 : schedulerName: fargate-scheduler

# coredns 파드 상세 정보 확인
kubectl get pod -n kube-system -l k8s-app=kube-dns -o yaml
...
  spec:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/os
              operator: In
              values:
              - linux
            - key: kubernetes.io/arch
              operator: In
              values:
              - amd64
              - arm64
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: k8s-app
                operator: In
                values:
                - kube-dns
            topologyKey: kubernetes.io/hostname
          weight: 100
      ...
      resources:
        limits:
          cpu: 250m
          memory: 256M
        requests:
          cpu: 250m
          memory: 256M
      ...
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          add:
          - NET_BIND_SERVICE
          drop:
          - ALL
        readOnlyRootFilesystem: true
    ...
    dnsPolicy: Default
    enableServiceLinks: true
    nodeName: fargate-ip-10-10-34-186.ap-northeast-2.compute.internal
    preemptionPolicy: PreemptLowerPriority
    priority: 2000001000
    priorityClassName: system-node-critical
    restartPolicy: Always
    schedulerName: fargate-scheduler
    securityContext: {}
    serviceAccount: coredns
    serviceAccountName: coredns
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/control-plane
    - key: CriticalAddonsOnly
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          k8s-app: kube-dns
      maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
    ...
    qosClass: Guaranteed

EC2 확인 (미존재)

fargate에 kube-ops-view 설치

helm 배포
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set env.TZ="Asia/Seoul" --namespace kube-system

# 포트 포워딩
kubectl port-forward deployment/kube-ops-view -n kube-system 8080:8080 &

# 접속 주소 확인 : 각각 1배, 1.5배, 3배 크기
echo -e "KUBE-OPS-VIEW URL = http://localhost:8080"
echo -e "KUBE-OPS-VIEW URL = http://localhost:8080/#scale=1.5"
echo -e "KUBE-OPS-VIEW URL = http://localhost:8080/#scale=3"

# node 확인 : 노드(Micro VM)
kubectl get csr
kubectl get node -owide
kubectl describe node | grep eks.amazonaws.com/compute-type

# kube-ops-view 디플로이먼트/파드 상세 정보 확인
kubectl get pod -n kube-system
kubectl get pod -n kube-system -o jsonpath='{.items[0].metadata.annotations.CapacityProvisioned}'
kubectl get pod -n kube-system -l app.kubernetes.io/instance=kube-ops-view -o jsonpath='{.items[0].metadata.annotations.CapacityProvisioned}'
0.25vCPU 0.5GB

# 디플로이먼트 상세 정보
kubectl get deploy -n kube-system kube-ops-view -o yaml
...
  template:
    ...
    spec:
      automountServiceAccountToken: true
      containers:
      - env:
        - name: TZ
          value: Asia/Seoul
        image: hjacobs/kube-ops-view:20.4.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 1
        name: kube-ops-view
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 1
        resources: {}
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        startupProbe:
          failureThreshold: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 1
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kube-ops-view
      serviceAccountName: kube-ops-view
      terminationGracePeriodSeconds: 30
...

# 파드 상세 정보 : admission control 이 동작했음을 알 수 있음
kubectl get pod -n kube-system -l app.kubernetes.io/instance=kube-ops-view -o yaml
...

#
kubectl describe pod -n kube-system -l app.kubernetes.io/instance=kube-ops-view | grep Events: -A10
Events:
  Type    Reason          Age   From               Message
  ----    ------          ----  ----               -------
  Normal  LoggingEnabled  3m22s  fargate-scheduler  Successfully enabled logging for pod
  Normal  Scheduled       2m40s  fargate-scheduler  Successfully assigned kube-system/kube-ops-view-796947d6dc-75qwb to fargate-ip-10-0-35-75.us-west-2.compute.internal
  Normal  Pulling         2m39s  kubelet            Pulling image "hjacobs/kube-ops-view:20.4.0"
  Normal  Pulled          2m31s  kubelet            Successfully pulled image "hjacobs/kube-ops-view:20.4.0" in 8.103s (8.103s including waiting). Image size: 81086356 bytes.
  Normal  Created         2m31s  kubelet            Created container kube-ops-view
  Normal  Started         2m31s  kubelet            Started container kube-ops-viewl

fargate 에 netshoot 디플로이먼트(파드)

vCPU valueMemory value

.25 vCPU	0.5 GB, 1 GB, 2 GB
.5 vCPU	1 GB, 2 GB, 3 GB, 4 GB
1 vCPU	2 GB, 3 GB, 4 GB, 5 GB, 6 GB, 7 GB, 8 GB
2 vCPU	Between 4 GB and 16 GB in 1-GB increments
4 vCPU	Between 8 GB and 30 GB in 1-GB increments
8 vCPU	Between 16 GB and 60 GB in 4-GB increments
16 vCPU	Between 32 GB and 120 GB in 8-GB increments

# 네임스페이스 생성
kubectl create ns study-aews

# 테스트용 파드 netshoot 디플로이먼트 생성 : 0.5vCPU 1GB 할당되어, 아래 Limit 값은 의미가 없음. 배포 시 대략 시간 측정해보자!
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: netshoot
  namespace: study-aews
spec:
  replicas: 1
  selector:
    matchLabels:
      app: netshoot
  template:
    metadata:
      labels:
        app: netshoot
    spec:
      containers:
      - name: netshoot
        image: nicolaka/netshoot
        command: ["tail"]
        args: ["-f", "/dev/null"]
        resources:
          requests:
            cpu: 500m
            memory: 500Mi
          limits:
            cpu: 2
            memory: 2Gi
      terminationGracePeriodSeconds: 0
EOF
kubectl get events -w --sort-by '.lastTimestamp'

# 확인 : 메모리 할당 측정은 어떻게 되었는지?
kubectl get pod -n study-aews -o wide
kubectl get pod -n study-aews -o jsonpath='{.items[0].metadata.annotations.CapacityProvisioned}'
0.5vCPU 1GB

# 디플로이먼트 상세 정보
kubectl get deploy -n study-aews netshoot -o yaml
...
  template:
    ...
    spec:
      ...
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 0
...

# 파드 상세 정보 : admission control 이 동작했음을 알 수 있음
kubectl get pod -n study-aews -l app=netshoot -o yaml
...
  metadata:
    annotations:
      CapacityProvisioned: 0.5vCPU 1GB
      Logging: LoggingEnabled
    ...
    preemptionPolicy: PreemptLowerPriority
    priority: 2000001000
    priorityClassName: system-node-critical
    restartPolicy: Always
    schedulerName: fargate-scheduler
    ...
    qosClass: Burstable

#
kubectl describe pod -n study-aews -l app=netshoot | grep Events: -A10

#
kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io
kubectl describe mutatingwebhookconfigurations 0500-amazon-eks-fargate-mutation.amazonaws.com
kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io

# 파드 내부에 zsh 접속 후 확인
kubectl exec -it deploy/netshoot -n study-aews -- zsh
-----------------------------------------------------
ip -c a
cat /etc/resolv.conf
curl ipinfo.io/ip # 출력되는 IP는 어떤것? , 어떤 경로를 통해서 인터넷이 되는 걸까?
ping -c 1 <다른 파드 IP ex. coredns pod ip>
lsblk
df -hT /
cat /etc/fstab
exit

-----------------------------------------------------

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: root-shell
  namespace: study-aews
spec:
  containers:
  - command:
    - /bin/cat
    image: alpine:3
    name: root-shell
    securityContext:
      privileged: true
    tty: true
    stdin: true
    volumeMounts:
    - mountPath: /host
      name: hostroot
  hostNetwork: true
  hostPID: true
  hostIPC: true
  tolerations:
  - effect: NoSchedule
    operator: Exists
  - effect: NoExecute
    operator: Exists
  volumes:
  - hostPath:
      path: /
    name: hostroot
EOF

#
kubectl get pod -n study-aews root-shell
kubectl describe pod -n study-aews root-shell | grep Events: -A 10

# 출력 메시지
# Pod not supported on Fargate: fields not supported:
# HostNetwork, HostPID, HostIPC, volumes not supported:
# hostroot is of an unsupported volume Type, invalid SecurityContext fields: Privileged

# 삭제
kubectl delete pod -n study-aews root-shell

# (참고) fargate가 아닌 권한이 충분한 곳에서 실행 시 : 아래 처럼 호스트 네임스페이스로 진입 가능!
kubectl -n kube-system exec -it root-shell -- chroot /host /bin/bash
root@myk8s-control-plane:/# id
uid=0(root) gid=0(root) groups=0(root),1(daemon),2(bin),3(sys),4(adm),6(disk),10(uucp),11,20(dialout),26(tape),27(sudo)

AWS ALB(Ingress)

# 게임 디플로이먼트와 Service, Ingress 배포
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: study-aews
  name: deployment-2048
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: app-2048
  replicas: 2
  template:
    metadata:
      labels:
        app.kubernetes.io/name: app-2048
    spec:
      containers:
      - image: public.ecr.aws/l6m2t8p7/docker-2048:latest
        imagePullPolicy: Always
        name: app-2048
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  namespace: study-aews
  name: service-2048
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: ClusterIP
  selector:
    app.kubernetes.io/name: app-2048
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: study-aews
  name: ingress-2048
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: service-2048
              port:
                number: 80
EOF

# 모니터링
watch -d kubectl get pod,ingress,svc,ep,endpointslices -n study-aews

# 생성 확인
kubectl get-all -n study-aews
kubectl get ingress,svc,ep,pod -n study-aews
kubectl get targetgroupbindings -n study-aews

# Ingress 확인
kubectl describe ingress -n study-aews ingress-2048
kubectl get ingress -n study-aews ingress-2048 -o jsonpath="{.status.loadBalancer.ingress[*].hostname}{'\n'}"

# 게임 접속 : ALB 주소로 웹 접속
kubectl get ingress -n study-aews ingress-2048 -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' | awk '{ print "Game URL = http://"$1 }'

# 파드 IP 확인
kubectl get pod -n study-aews -owide

# 파드 증가
kubectl scale deployment -n study-aews deployment-2048 --replicas 4

# 게임 실습 리소스 삭제
kubectl delete ingress ingress-2048 -n study-aews
kubectl delete svc service-2048 -n study-aews && kubectl delete deploy deployment-2048 -n study-aews

fargate job

#
cat <<EOF | kubectl apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: busybox1
  namespace: study-aews
spec:
  template:
    spec:
      containers:
      - name: busybox
        image: busybox
        command: ["/bin/sh", "-c", "sleep 10"]
      restartPolicy: Never
  ttlSecondsAfterFinished: 60 # <-- TTL controller
---
apiVersion: batch/v1
kind: Job
metadata:
  name: busybox2
  namespace: study-aews
spec:
  template:
    spec:
      containers:
      - name: busybox
        image: busybox
        command: ["/bin/sh", "-c", "sleep 10"]
      restartPolicy: Never
EOF

#
kubectl get job,pod -n study-aews
kubectl get job -n study-aews -w
kubectl get pod -n study-aews -w
kubectl get job,pod -n study-aews

#
kubectl delete job busybox2 -n study-aews

#
kubectl create ns study-aews
kubectl get job,pod -n study-aews

fargate logging

Fargate의 Amazon EKS는 Fluent Bit 기반의 내장 로그 라우터를 제공합니다. 즉, Fluent Bit 컨테이너를 사이드카로 명시적으로 실행하지 않고 Amazon에서 실행합니다. 로그 라우터를 구성하기만 하면 됩니다.

구성은 다음 기준을 충족해야 하는 전용 ConfigMap을 통해 이루어집니다.
- 이름 : aws-logging
- aws-observability라는 전용 네임스페이스에서 생성됨
- 5300자를 초과할 수 없습니다.
ConfigMap을 생성하면 Fargate의 Amazon EKS가 자동으로 이를 감지하고 로그 라우터를 구성합니다. Fargate는 AWS에서 관리하는 Fluent Bit의 업스트림 호환 배포판인 Fluent Bit용 AWS 버전을 사용합니다. 자세한 내용은 GitHub의 Fluent Bit용 AWS를 참조하세요 - Docs
로그 라우터를 사용하면 AWS의 다양한 서비스를 로그 분석 및 저장에 사용할 수 있습니다. Fargate에서 Amazon CloudWatch, Amazon OpenSearch 서비스로 로그를 직접 스트리밍할 수 있습니다. 또한 Amazon Data Firehose를 통해 Amazon S3, Amazon Kinesis 데이터 스트림 및 파트너 도구와 같은 대상으로 로그를 스트리밍할 수도 있습니다.
Fargate 포드를 배포할 기존 Kubernetes 네임스페이스를 지정하는 기존 Fargate 프로필입니다.

GitHub - aws/aws-for-fluent-bit: The source of the amazon/aws-for-fluent-bit container image

The source of the amazon/aws-for-fluent-bit container image - aws/aws-for-fluent-bit

github.com

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
  namespace: study-aews
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx:latest
        name: nginx
        ports:
        - containerPort: 80
          name: http
        resources:
          requests:
            cpu: 500m
            memory: 500Mi
          limits:
            cpu: 2
            memory: 2Gi
---
apiVersion: v1
kind: Service
metadata:
  name: sample-app
  namespace: study-aews
spec:
  selector:
    app: nginx
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
  type: ClusterIP
EOF

# 확인
kubectl get pod -n study-aews -l app=nginx
kubectl describe pod -n study-aews -l app=nginx

# 반복 접속
kubectl exec -it deploy/netshoot -n study-aews -- curl sample-app | grep title
while true; do kubectl exec -it deploy/netshoot -n study-aews -- curl sample-app | grep title; sleep 1; echo ; date; done;

# 로그 확인
kubectl stern -n study-aews -l app=nginx

# main.tf
...
  # Enable Fargate logging this may generate a large ammount of logs, disable it if not explicitly required
  enable_fargate_fluentbit = true
  fargate_fluentbit = {
    flb_log_cw = true
  }
...

# aws-observability라는 이름의 전용 네임스페이스 확인
kubectl get ns --show-labels

# Fluent Conf 데이터 값이 포함된 ConfigMap : 컨테이너 로그를 목적지로 배송 설정
## Amazon EKS Fargate 로깅은 ConfigMap의 동적 구성을 지원하지 않습니다.
## ConfigMap에 대한 모든 변경 사항은 새 포드에만 적용됩니다. 기존 포드에는 변경 사항이 적용되지 않습니다.
kubectl get cm -n aws-observability
kubectl get cm -n aws-observability aws-logging -o yaml
data:
  filters.conf: |
    [FILTER]
      Name parser
      Match *
      Key_name log
      Parser crio
    [FILTER]
      Name kubernetes
      Match kube.*
      Merge_Log On
      Keep_Log Off
      Buffer_Size 0
      Kube_Meta_Cache_TTL 300s
  flb_log_cw: "true"  # Ships Fluent Bit process logs to CloudWatch.
  output.conf: |+
    [OUTPUT]
          Name cloudwatch
          Match kube.*
          region ap-northeast-2
          log_group_name /fargate-serverless/fargate-fluentbit-logs2025031600585521800000000c
          log_stream_prefix fargate-logs-
          auto_create_group true
    [OUTPUT]
          Name cloudwatch_logs
          Match *
          region ap-northeast-2
          log_group_name /fargate-serverless/fargate-fluentbit-logs2025031600585521800000000c
          log_stream_prefix fargate-logs-fluent-bit-
          auto_create_group true

  parsers.conf: |
    [PARSER]
      Name crio
      Format Regex
      Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>P|F) (?<log>.*)$
      Time_Key    time
      Time_Format %Y-%m-%dT%H:%M:%S.%L%z
      Time_Keep On

# 테라폼 삭제 : vpc 삭제가 잘 안될 경우 aws 콘솔에서 vpc 수동 삭제 -> vnic 등 남아 있을 경우 해당 vnic 강제 삭제
terraform destroy -auto-approve

# VPC 삭제 확인
aws ec2 describe-vpcs --filter 'Name=isDefault,Values=false' --output yaml

# kubeconfig 삭제
rm -rf ~/.kube/config

Auto-mode

항목	내용
정의	EKS의 Managed Node Group 및 Cluster Autoscaler를 활용한 자동 관리 기능
기능	- 노드 자동 프로비저닝 - 노드 자동 업그레이드 및 패치 적용 - 자동 확장 및 축소 (Auto Scaling) - 자동 Kubernetes 버전 업데이트
노드 관리 주체	AWS가 노드 생성부터 종료까지 자동으로 관리
설정 방식	- eksctl 명령어 또는 AWS Management Console을 통한 설정 가능 - YAML을 통한 상세 설정 가능
Cluster Autoscaler 연동	- Kubernetes 리소스 (Pod)의 증가/감소에 따라 자동으로 노드 수를 조정
장점	- 관리 편의성 증가 - 비용 최적화 (필요한 만큼만 노드 운영) - 자동 업데이트로 보안 강화
단점/유의사항	- AWS에 의존도가 높아 세부적인 노드 컨트롤은 제한적임 - 세부 관리가 필요한 환경에서는 Self-managed 방식이 적합
권장 활용 사례	- 운영 관리 효율성을 극대화할 때 - 급격한 워크로드 변화가 예상될 때

실습 (sample-aws-eks-auto-mode)

https://github.com/aws-samples/sample-aws-eks-auto-mode

GitHub - aws-samples/sample-aws-eks-auto-mode

Contribute to aws-samples/sample-aws-eks-auto-mode development by creating an account on GitHub.

github.com

# Get the code : 배포 코드에 addon 내용이 없음
git clone https://github.com/aws-samples/sample-aws-eks-auto-mode.git
cd sample-aws-eks-auto-mode/terraform

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.24"

  cluster_name    = var.name
  cluster_version = var.eks_cluster_version

  # Give the Terraform identity admin access to the cluster
  # which will allow it to deploy resources into the cluster
  enable_cluster_creator_admin_permissions = true
  cluster_endpoint_public_access           = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  cluster_compute_config = {
    enabled    = true
    node_pools = ["general-purpose"]
  }
  tags = local.tags

# Initialize and apply Terraform : 9:19~
terraform init
terraform plan
terraform apply -auto-approve

# Configure kubectl
cat setup.tf
ls -l ../nodepools
$(terraform output -raw configure_kubectl)

# kubectl context 변경
kubectl ctx
aws eks update-kubeconfig --name automode-cluster --region us-west-2
kubectl ns default

# 아래 IP의 ENI 찾아보자
kubectl get svc,ep
NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   172.20.0.1   <none>        443/TCP   15m

NAME                   ENDPOINTS                      AGE
endpoints/kubernetes   10.0.2.0:443,10.0.35.204:443   15m

#
terraform state list

terraform show
terraform state show 'module.eks.aws_eks_cluster.this[0]'
...
    compute_config {
        enabled       = true
        node_pools    = [
            "general-purpose",
        ]
        node_role_arn = "arn:aws:iam::911283464785:role/automode-cluster-eks-auto-20250316042752605600000003"
    }
...

#
kubectl get crd
NAME                                         CREATED AT
cninodes.eks.amazonaws.com                   2025-03-21T12:55:32Z
cninodes.vpcresources.k8s.aws                2025-03-21T12:52:04Z
ingressclassparams.eks.amazonaws.com         2025-03-21T12:55:33Z
nodeclaims.karpenter.sh                      2025-03-21T12:55:54Z
nodeclasses.eks.amazonaws.com                2025-03-21T12:55:54Z
nodediagnostics.eks.amazonaws.com            2025-03-21T12:55:54Z
nodepools.karpenter.sh                       2025-03-21T12:55:54Z
policyendpoints.networking.k8s.aws           2025-03-21T12:52:04Z
securitygrouppolicies.vpcresources.k8s.aws   2025-03-21T12:52:03Z
targetgroupbindings.eks.amazonaws.com        2025-03-21T12:55:34Z

kubectl api-resources | grep -i node
nodes                               no           v1                                false        Node
cninodes                            cni,cnis     eks.amazonaws.com/v1alpha1        false        CNINode
nodeclasses                                      eks.amazonaws.com/v1              false        NodeClass
nodediagnostics                                  eks.amazonaws.com/v1alpha1        false        NodeDiagnostic
nodeclaims                                       karpenter.sh/v1                   false        NodeClaim
nodepools                                        karpenter.sh/v1                   false        NodePool
runtimeclasses                                   node.k8s.io/v1                    false        RuntimeClass
csinodes                                         storage.k8s.io/v1                 false        CSINode
cninodes                            cnd          vpcresources.k8s.aws/v1alpha1     false        CNINode

# 노드에 Access가 불가능하니, 분석 지원(CRD)제공
kubectl explain nodediagnostics
GROUP:      eks.amazonaws.com
KIND:       NodeDiagnostic
VERSION:    v1alpha1

DESCRIPTION:
    The name of the NodeDiagnostic resource is meant to match the name of the
    node which should perform the diagnostic tasks

#
kubectl get nodeclasses.eks.amazonaws.com
NAME      ROLE                                                   READY   AGE
default   automode-cluster-eks-auto-20250314121820950800000003   True    29m

kubectl get nodeclasses.eks.amazonaws.com -o yaml
...
  spec:
    ephemeralStorage:
      iops: 3000
      size: 80Gi
      throughput: 125
    networkPolicy: DefaultAllow
    networkPolicyEventLogs: Disabled
    role: automode-cluster-eks-auto-20250314121820950800000003
    securityGroupSelectorTerms:
    - id: sg-05d210218e5817fa1
    snatPolicy: Random # ???
    subnetSelectorTerms:
    - id: subnet-0539269140458ced5
    - id: subnet-055dc112cdd434066
    - id: subnet-0865f60e4a6d8ad5c
  status:
    ...
    instanceProfile: eks-ap-northeast-2-automode-cluster-4905473370491687283
    securityGroups:
    - id: sg-05d210218e5817fa1
      name: eks-cluster-sg-automode-cluster-2065126657
    subnets:
    - id: subnet-0539269140458ced5
      zone: ap-northeast-2a
      zoneID: apne2-az1
    - id: subnet-055dc112cdd434066
      zone: ap-northeast-2b
      zoneID: apne2-az2
    - id: subnet-0865f60e4a6d8ad5c
      zone: ap-northeast-2c
      zoneID: apne2-az3

#
kubectl get nodepools
NAME              NODECLASS   NODES   READY   AGE
general-purpose   default     0       True    13m

kubectl get nodepools -o yaml
...
  spec:
    disruption:
      budgets:
      - nodes: 10%
      consolidateAfter: 30s
      consolidationPolicy: WhenEmptyOrUnderutilized
    template:
      metadata: {}
      spec:
        expireAfter: 336h # 14일
        nodeClassRef:
          group: eks.amazonaws.com
          kind: NodeClass
          name: default
        requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
          - on-demand
        - key: eks.amazonaws.com/instance-category
          operator: In
          values:
          - c
          - m
          - r
        - key: eks.amazonaws.com/instance-generation
          operator: Gt
          values:
          - "4"
        - key: kubernetes.io/arch
          operator: In
          values:
          - amd64
        - key: kubernetes.io/os
          operator: In
          values:
          - linux
        terminationGracePeriod: 24h0m0s
...

#
kubectl get mutatingwebhookconfiguration
NAME                            WEBHOOKS   AGE
eks-load-balancing-webhook      2          14m
pod-identity-webhook            1          17m
vpc-resource-mutating-webhook   1          17m

kubectl get validatingwebhookconfiguration
NAME                              WEBHOOKS   AGE
vpc-resource-validating-webhook   2          17m

kube-ops-view 설치

# 모니터링
eks-node-viewer --node-sort=eks-node-viewer/node-cpu-usage=dsc --extra-labels eks-node-viewer/node-age
watch -d kubectl get node,pod -A

# helm 배포
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set env.TZ="Asia/Seoul" --namespace kube-system
kubectl get events -w --sort-by '.lastTimestamp' # 출력 이벤트 로그 분석해보자

# 확인
kubectl get nodeclaims
NAME                    TYPE        CAPACITY    ZONE         NODE   READY     AGE
general-purpose-9jqb9   c6a.large   on-demand   us-west-2b          Unknown   17s

# OS, KERNEL, CRI 확인
NAME                  STATUS   ROLES    AGE   VERSION               INTERNAL-IP   EXTERNAL-IP   OS-IMAGE
                           KERNEL-VERSION   CONTAINER-RUNTIME
i-0aa251960a87d2c37   Ready    <none>   18s   v1.31.4-eks-0f56d01   10.0.23.147   <none>        Bottlerocket (EKS Auto) 2025.3.14 (aws-k8s-1.31)   6.1.129          containerd://1.7.25+bottlerocket

# CNI 노드 확인
kubectl get cninodes.eks.amazonaws.com
NAME                  AGE
i-0aa251960a87d2c37   4s

#[신규 터미널] 포트 포워딩
kubectl port-forward deployment/kube-ops-view -n kube-system 8080:8080 &

# 접속 주소 확인 : 각각 1배, 1.5배, 3배 크기
echo -e "KUBE-OPS-VIEW URL = http://localhost:8080"
echo -e "KUBE-OPS-VIEW URL = http://localhost:8080/#scale=1.5"
echo -e "KUBE-OPS-VIEW URL = http://localhost:8080/#scale=3"

open "http://127.0.0.1:8080/#scale=1.5" # macOS

karpenter 동작 확인

# Step 1: Review existing compute resources (optional)
kubectl get nodepools
NAME              NODECLASS   NODES   READY   AGE
general-purpose   default     1       True    17m

# Step 2: Deploy a sample application to the cluster
# eks.amazonaws.com/compute-type: auto selector requires the workload be deployed on an Amazon EKS Auto Mode node.
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 1
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      nodeSelector:
        eks.amazonaws.com/compute-type: auto
      securityContext:
        runAsUser: 1000
        runAsGroup: 3000
        fsGroup: 2000
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
          securityContext:
            allowPrivilegeEscalation: false
EOF

# Step 3: Watch Kubernetes Events
kubectl get events -w --sort-by '.lastTimestamp'
2m12s       Normal   Synced                    node/i-0aa251960a87d2c37          Node synced successfully
2m8s        Normal   DisruptionBlocked         node/i-0aa251960a87d2c37          Node is nominated for a pending pod
2m8s        Normal   RegisteredNode            node/i-0aa251960a87d2c37          Node i-0aa251960a87d2c37 event: Registered Node i-0aa251960a87d2c37 in Controller
88s         Normal   Unconsolidatable          nodeclaim/general-purpose-9jqb9   Can't replace with a cheaper node
88s         Normal   Unconsolidatable          node/i-0aa251960a87d2c37          Can't replace with a cheaper node
19s         Normal   Scheduled                 pod/inflate-b6b45f8d4-6w75w       Successfully assigned default/inflate-b6b45f8d4-6w75w to i-0aa251960a87d2c37
19s         Normal   SuccessfulCreate          replicaset/inflate-b6b45f8d4      Created pod: inflate-b6b45f8d4-6w75w
19s         Normal   ScalingReplicaSet         deployment/inflate                Scaled up replica set inflate-b6b45f8d4 to 1
18s         Normal   Pulling                   pod/inflate-b6b45f8d4-6w75w       Pulling image "public.ecr.aws/eks-distro/kubernetes/pause:3.7"
17s         Normal   Pulled                    pod/inflate-b6b45f8d4-6w75w       Successfully pulled image "public.ecr.aws/eks-distro/kubernetes/pause:3.7" in 1.211s (1.211s including waiting). Image size: 2002080 bytes.
17s         Normal   Created                   pod/inflate-b6b45f8d4-6w75w       Created container inflate
17s         Normal   Started                   pod/inflate-b6b45f8d4-6w75w       Started container inflate
kubectl get nodes
NAME                  STATUS   ROLES    AGE     VERSION
i-0aa251960a87d2c37   Ready    <none>   3m15s   v1.31.4-eks-0f56d01

# custom node pool 생성 : 고객 NodePool : Karpenter 와 키가 다르니 주의!
ls ../nodepools
cat ../nodepools/graviton-nodepool.yaml
kubectl apply -f ../nodepools/graviton-nodepool.yaml
---
kind: NodeClass
metadata:
  name: graviton-nodeclass
spec:
  role: automode-cluster-eks-auto-20250321124641255100000002
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "automode-demo"
  securityGroupSelectorTerms:
    - tags:
        kubernetes.io/cluster/automode-cluster: owned
  tags:
    karpenter.sh/discovery: "automode-demo"
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: graviton-nodepool
spec:
  template:
    spec:
      nodeClassRef:
        group: eks.amazonaws.com
        kind: NodeClass
        name: graviton-nodeclass
      requirements:
        - key: "eks.amazonaws.com/instance-category"
          operator: In
          values: ["c", "m", "r"]
        - key: "eks.amazonaws.com/instance-cpu"
          operator: In
          values: ["4", "8", "16", "32"]
        - key: "kubernetes.io/arch"
          operator: In
          values: ["arm64"]
      taints:
        - key: "arm64"
          value: "true"
          effect: "NoSchedule"  # Prevents non-ARM64 pods from scheduling
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s

#
kubectl get NodeClass
NAME                 ROLE                                                   READY   AGE
default              automode-cluster-eks-auto-20250321124641255100000002   True    21m
graviton-nodeclass   automode-cluster-eks-auto-20250321124641255100000002   True    8s

kubectl get NodePool
NAME                NODECLASS            NODES   READY   AGE
general-purpose     default              0       True    64m
graviton-nodepool   graviton-nodeclass   0       True    3m32s

#
ls ../examples/graviton
cat ../examples/graviton/game-2048.yaml
...
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "200m"
              memory: "256Mi"
      automountServiceAccountToken: false
      tolerations:
      - key: "arm64"
        value: "true"
        effect: "NoSchedule"
      nodeSelector:
        kubernetes.io/arch: arm64
...

kubectl apply -f ../examples/graviton/game-2048.yaml

# c6g.xlarge : vCPU 4, 8 GiB RAM > 스팟 선택됨!
kubectl get nodeclaims
NAME                      TYPE         CAPACITY   ZONE              NODE                  READY   AGE
graviton-nodepool-ngp42   c6g.xlarge   spot       ap-northeast-2b   i-0b7ca5072ebf3c969   True    9m48s

kubectl get nodeclaims -o yaml
...
  spec:
    expireAfter: 336h
    ...
kubectl get cninodes.eks.amazonaws.com
kubectl get cninodes.eks.amazonaws.com -o yaml
eks-node-viewer --resources cpu,memory
kubectl get node -owide
kubectl describe node
...
Taints:             arm64=true:NoSchedule
...
Conditions:
  Type                    Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                    ------  -----------------                 ------------------                ------                       -------
  MemoryPressure          False   Fri, 14 Mar 2025 22:53:54 +0900   Fri, 14 Mar 2025 22:37:35 +0900   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure            False   Fri, 14 Mar 2025 22:53:54 +0900   Fri, 14 Mar 2025 22:37:35 +0900   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure             False   Fri, 14 Mar 2025 22:53:54 +0900   Fri, 14 Mar 2025 22:37:35 +0900   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                   True    Fri, 14 Mar 2025 22:53:54 +0900   Fri, 14 Mar 2025 22:37:35 +0900   KubeletReady                 kubelet is posting ready status
  KernelReady             True    Fri, 14 Mar 2025 22:57:39 +0900   Fri, 14 Mar 2025 22:37:39 +0900   KernelIsReady                Monitoring for the Kernel system is active
  ContainerRuntimeReady   True    Fri, 14 Mar 2025 22:57:39 +0900   Fri, 14 Mar 2025 22:37:39 +0900   ContainerRuntimeIsReady      Monitoring for the ContainerRuntime system is active
  StorageReady            True    Fri, 14 Mar 2025 22:57:39 +0900   Fri, 14 Mar 2025 22:37:39 +0900   DiskIsReady                  Monitoring for the Disk system is active
  NetworkingReady         True    Fri, 14 Mar 2025 22:57:39 +0900   Fri, 14 Mar 2025 22:37:39 +0900   NetworkingIsReady            Monitoring for the Networking system is active
...
System Info:
  Machine ID:                 ec272ed9293b6501bd9f665eed7e1627
  System UUID:                ec272ed9-293b-6501-bd9f-665eed7e1627
  Boot ID:                    97c24ba6-d319-4686-abf8-bb62c4f22888
  Kernel Version:             6.1.129
  OS Image:                   Bottlerocket (EKS Auto) 2025.3.9 (aws-k8s-1.31)
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  containerd://1.7.25+bottlerocket
  Kubelet Version:            v1.31.4-eks-0f56d01
  Kube-Proxy Version:         v1.31.4-eks-0f56d01

#
kubectl get deploy,pod -n game-2048 -owide

ingress 설정

#
cat ../examples/graviton/2048-ingress.yaml
...
apiVersion: eks.amazonaws.com/v1
kind: IngressClassParams
metadata:
  namespace: game-2048
  name: params
spec:
  scheme: internet-facing

---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  namespace: game-2048
  labels:
    app.kubernetes.io/name: LoadBalancerController
  name: alb
spec:
  controller: eks.amazonaws.com/alb
  parameters:
    apiGroup: eks.amazonaws.com
    kind: IngressClassParams
    name: params

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: service-2048
                port:
                  number: 80

kubectl apply -f ../examples/graviton/2048-ingress.yaml

#
kubectl get ingressclass,ingressclassparams,ingress,svc,ep -n game-2048
NAME                                 CONTROLLER              PARAMETERS                                    AGE
ingressclass.networking.k8s.io/alb   eks.amazonaws.com/alb   IngressClassParams.eks.amazonaws.com/params   105s

NAME                                          GROUP-NAME   SCHEME            IP-ADDRESS-TYPE   AGE
ingressclassparams.eks.amazonaws.com/params                internet-facing                     105s

NAME                                     CLASS   HOSTS   ADDRESS                                                                       PORTS   AGE
ingress.networking.k8s.io/ingress-2048   alb     *       k8s-game2048-ingress2-db993ba6ac-782663732.ap-northeast-2.elb.amazonaws.com   80      105s

NAME                   TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/service-2048   NodePort   172.20.194.11   <none>        80:30280/TCP   105s

NAME                     ENDPOINTS        AGE
endpoints/service-2048   10.20.30.64:80   105s

# Get security group IDs
ALB_SG=$(aws elbv2 describe-load-balancers \
  --query 'LoadBalancers[?contains(DNSName, `game2048`)].SecurityGroups[0]' \
  --output text)

EKS_SG=$(aws eks describe-cluster \
  --name automode-cluster \
  --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' \
  --output text)

echo $ALB_SG $EKS_SG # 해당 보안그룹을 관리콘솔에서 정책 설정 먼저 확인해보자

# Allow ALB to communicate with EKS cluster : 실습 환경 삭제 때, 미리 $EKS_SG에 추가된 규칙만 제거해둘것.
aws ec2 authorize-security-group-ingress \
  --group-id $EKS_SG \
  --source-group $ALB_SG \
  --protocol tcp \
  --port 80

# 아래 웹 주소로 http 접속!
kubectl get ingress ingress-2048 \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' \
  -n game-2048
k8s-game2048-ingress2-db993ba6ac-782663732.ap-northeast-2.elb.amazonaws.com

# 노드 인스턴스 ID 확인
kubectl get node
NODEID=<각자 자신의 노드ID>
NODEID=i-0b12733f9b75cd835

# 디버그 컨테이너를 실행합니다. 다음 명령어는 노드의 인스턴스 ID에 i-01234567890123456을 사용하며,
# 대화형 사용을 위해 tty와 stdin을 할당하고 kubeconfig 파일의 sysadmin 프로필을 사용합니다.
kubectl debug node/$NODEID -it --profile=sysadmin --image=public.ecr.aws/amazonlinux/amazonlinux:2023
-------------------------------------------------
bash-5.2#

# 셸에서 이제 nsenter 명령을 제공하는 util-linux-core를 설치할 수 있습니다.
# nsenter를 사용하여 호스트에서 PID 1의 마운트 네임스페이스(init)를 입력하고 journalctl 명령을 실행하여 큐블릿에서 로그를 스트리밍합니다:
yum install -y util-linux-core htop
nsenter -t 1 -m journalctl -f -u kubelet
htop # 해당 노드(인스턴스) CPU,Memory 크기 확인

# 정보 확인
nsenter -t 1 -m ip addr
nsenter -t 1 -m ps -ef
nsenter -t 1 -m ls -l /proc
nsenter -t 1 -m df -hT

nsenter -t 1 -m ctr
nsenter -t 1 -m ctr ns ls
nsenter -t 1 -m ctr -n k8s.io containers ls
CONTAINER                                                           IMAGE                                          RUNTIME
09e8f837f54d66305f3994afeee44d800971d7a921c06720382948dbdd9c6fab    localhost/kubernetes/pause:0.1.0               io.containerd.runc.v2
2e1af1a9ff996505c0de0ee6b55bd8b3fefdaf6579fd6af46e978ad6e2096bae    public.ecr.aws/amazonlinux/amazonlinux:2023    io.containerd.runc.v2
ebd4b8805ed0338a0bc6a54625c51f3a4c202641f0d492409d974871db49c476    localhost/kubernetes/pause:0.1.0               io.containerd.runc.v2
f4924417248c24b7eaac2498a11c912b575fc03cef1b8e55cae66772bdb36af5    docker.io/hjacobs/kube-ops-view:20.4.0         io.containerd.runc.v2

...

# (참고) 보안을 위해 Amazon Linux 컨테이너 이미지는 기본적으로 많은 바이너리를 설치하지 않습니다.
# yum whatproved 명령을 사용하여 특정 바이너리를 제공하기 위해 설치해야 하는 패키지를 식별할 수 있습니다.
yum whatprovides ps
-------------------------------------------------

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: root-shell
  namespace: kube-system
spec:
  containers:
  - command:
    - /bin/cat
    image: alpine:3
    name: root-shell
    securityContext:
      privileged: true
    tty: true
    stdin: true
    volumeMounts:
    - mountPath: /host
      name: hostroot
  hostNetwork: true
  hostPID: true
  hostIPC: true
  tolerations:
  - effect: NoSchedule
    operator: Exists
  - effect: NoExecute
    operator: Exists
  volumes:
  - hostPath:
      path: /
    name: hostroot
EOF

# 파드 확인 : 파드와 노드의 IP가 같다 (hostNetwork: true)
kubectl get pod -n kube-system root-shell
NAME         READY   STATUS    RESTARTS   AGE
root-shell   1/1     Running   0          19s
kubectl get node,pod -A -owide
NAME                       STATUS   ROLES    AGE   VERSION               INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                          KERNEL-VERSION   CONTAINER-RUNTIME
node/i-0b12733f9b75cd835   Ready    <none>   62m   v1.31.4-eks-0f56d01   10.20.1.198   <none>        Bottlerocket (EKS Auto) 2025.3.9 (aws-k8s-1.31)   6.1.129          containerd://1.7.25+bottlerocket

NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE     IP            NODE                  NOMINATED NODE   READINESS GATES
kube-system   pod/kube-ops-view-657dbc6cd8-2lx4d   1/1     Running   0          62m     10.20.12.0    i-0b12733f9b75cd835   <none>           <none>
kube-system   pod/root-shell                       1/1     Running   0          5m29s   10.20.1.198   i-0b12733f9b75cd835   <none>           <none>

# 호스트패스 : 파드에 /host 경로에 rw로 마운트 확인
kubectl describe pod -n kube-system root-shell
...
    Mounts:
      /host from hostroot (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hj5c6 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  hostroot:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:
...

# 탈취 시도
kubectl -n kube-system exec -it root-shell -- chroot /host /bin/sh

현재글[AEWS] 7주차 - EKS Mode/Nodes

summary-aws 님의 블로그

summary-aws 님의 블로그 입니다.

Today :
Yesterday :

summary-aws 님의 블로그