이동이 표시되면 PC 클러스터가 리소스를 이동하지 않습니다.

2024-5-30 • tag-icon

어떤 이유로든 더 이상 리소스를 이동할 수 없습니다.pcs

pacemaker-1.1.16-12.el7_4.8.x86_64
corosync-2.4.0-9.el7_4.2.x86_64
pcs-0.9.158-6.el7.centos.1.x86_64
Linux server_a.test.local 3.10.0-693.el7.x86_64

리소스 그룹의 일부로 4개의 리소스가 구성되어 있습니다. ClusterIP리소스를 사용에서 사용으로 이동 하려고 server_d할 때의 작업 로그입니다.server_apcs resource move ClusterIP servr_a.test.local

Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Forwarding cib_delete operation for section constraints to all (origin=local/crm_resource/3)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: --- 0.24.0 2
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: +++ 0.25.0 (null)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       -- /cib/configuration/constraints/rsc_location[@id='cli-prefer-ClusterIP']
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       +  /cib:  @epoch=25
Apr 06 12:16:26 [17292] server_d.test.local       crmd:     info: abort_transition_graph:       Transition aborted by deletion of rsc_location[@id='cli-prefer-ClusterIP']: Configuration change | cib=0.25.0 source=te_update_diff:456 path=/cib/configuration/constraints/rsc_location[@id='cli-prefer-ClusterIP'] complete=true
Apr 06 12:16:26 [17292] server_d.test.local       crmd:   notice: do_state_transition:  State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Completed cib_delete operation for section constraints: OK (rc=0, origin=server_d.test.local/crm_resource/3, version=0.25.0)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_a.test.local is online
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_d.test.local is online
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: group_print:   Resource Group: my_app
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      Apache     (systemd:httpd):        Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      stunnel    (systemd:stunnel-my_app): Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      my_app-daemon        (systemd:my_app): Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   ClusterIP       (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   Apache  (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   stunnel (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   my_app-daemon     (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:   notice: process_pe_message:   Calculated transition 8, saving inputs in /var/lib/pacemaker/pengine/pe-input-18.bz2
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Forwarding cib_modify operation for section constraints to all (origin=local/crm_resource/4)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: --- 0.25.0 2
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: +++ 0.26.0 (null)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       +  /cib:  @epoch=26
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       ++ /cib/configuration/constraints:  <rsc_location id="cli-prefer-ClusterIP" rsc="ClusterIP" role="Started" node="server_a.test.local" score="INFINITY"/>
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Completed cib_modify operation for section constraints: OK (rc=0, origin=server_d.test.local/crm_resource/4, version=0.26.0)
Apr 06 12:16:26 [17292] server_d.test.local       crmd:     info: abort_transition_graph:       Transition aborted by rsc_location.cli-prefer-ClusterIP 'create': Configuration change | cib=0.26.0 source=te_update_diff:456 path=/cib/configuration/constraints complete=true
Apr 06 12:16:26 [17292] server_d.test.local       crmd:     info: handle_response:      pe_calc calculation pe_calc-dc-1523016986-67 is obsolete
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_a.test.local is online
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_d.test.local is online
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: group_print:   Resource Group: my_app
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      Apache     (systemd:httpd):        Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      stunnel    (systemd:stunnel-my_app): Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      my_app-daemon        (systemd:my_app): Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   ClusterIP       (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   Apache  (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   stunnel (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   my_app-daemon     (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:   notice: process_pe_message:   Calculated transition 9, saving inputs in /var/lib/pacemaker/pengine/pe-input-19.bz2
Apr 06 12:16:27 [17292] server_d.test.local       crmd:     info: do_state_transition:  State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Apr 06 12:16:27 [17292] server_d.test.local       crmd:     info: do_te_invoke: Processing graph 9 (ref=pe_calc-dc-1523016987-68) derived from /var/lib/pacemaker/pengine/pe-input-19.bz2
Apr 06 12:16:27 [17292] server_d.test.local       crmd:   notice: run_graph:    Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-19.bz2): Complete
Apr 06 12:16:27 [17292] server_d.test.local       crmd:     info: do_log:       Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Apr 06 12:16:27 [17292] server_d.test.local       crmd:   notice: do_state_transition:  State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_backup:      Archived previous version as /var/lib/pacemaker/cib/cib-34.raw
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Wrote version 0.25.0 of the CIB to disk (digest: 7511cba55b6c2f2f481a51d5585b8d36)
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Reading cluster configuration file /var/lib/pacemaker/cib/cib.tPIv7m (digest: /var/lib/pacemaker/cib/cib.OwHiKz)
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_backup:      Archived previous version as /var/lib/pacemaker/cib/cib-35.raw
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Wrote version 0.26.0 of the CIB to disk (digest: 7f962ed676a49e84410eee2ee04bae8c)
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Reading cluster configuration file /var/lib/pacemaker/cib/cib.MnRP4u (digest: /var/lib/pacemaker/cib/cib.B5sWNH)
Apr 06 12:16:31 [17287] server_d.test.local        cib:     info: cib_process_ping:     Reporting our current digest to server_d.test.local: 8182592cb4922cbf007158ab0a277190 for 0.26.0 (0x5575234afde0 0)

한 가지 주목할 점은 pcs cluster stop server_b.test.local구성을 실행하면 그룹 내의 모든 리소스가 다른 노드로 이동된다는 것입니다.

어떻게 되어가나요? 내가 말했듯이, 그것은 효과가 있었고 그 이후로는 아무런 변화도 없었습니다.

미리 감사드립니다!

편집하다:

pcs config

[root@server_a ~]# pcs config
Cluster Name: my_app_cluster
Corosync Nodes:
 server_a.test.local server_d.test.local
Pacemaker Nodes:
 server_a.test.local server_d.test.local

Resources:
 Group: my_app
  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=24 ip=10.116.63.49
   Operations: monitor interval=10s timeout=20s (ClusterIP-monitor-interval-10s)
               start interval=0s timeout=20s (ClusterIP-start-interval-0s)
               stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
  Resource: Apache (class=systemd type=httpd)
   Operations: monitor interval=60 timeout=100 (Apache-monitor-interval-60)
               start interval=0s timeout=100 (Apache-start-interval-0s)
               stop interval=0s timeout=100 (Apache-stop-interval-0s)
  Resource: stunnel (class=systemd type=stunnel-my_app)
   Operations: monitor interval=60 timeout=100 (stunnel-monitor-interval-60)
               start interval=0s timeout=100 (stunnel-start-interval-0s)
               stop interval=0s timeout=100 (stunnel-stop-interval-0s)
  Resource: my_app-daemon (class=systemd type=my_app)
   Operations: monitor interval=60 timeout=100 (my_app-daemon-monitor-interval-60)
               start interval=0s timeout=100 (my_app-daemon-start-interval-0s)
               stop interval=0s timeout=100 (my_app-daemon-stop-interval-0s)

Stonith Devices:
Fencing Levels:

Location Constraints:
  Resource: Apache
    Enabled on: server_d.test.local (score:INFINITY) (role: Started) (id:cli-prefer-Apache)
  Resource: ClusterIP
    Enabled on: server_a.test.local (score:INFINITY) (role: Started) (id:cli-prefer-ClusterIP)
  Resource: my_app-daemon
    Enabled on: server_a.test.local (score:INFINITY) (role: Started) (id:cli-prefer-my_app-daemon)
  Resource: stunnel
    Enabled on: server_a.test.local (score:INFINITY) (role: Started) (id:cli-prefer-stunnel)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: my_app_cluster
 dc-version: 1.1.16-12.el7_4.8-94ff4df
 have-watchdog: false
 stonith-enabled: false

Quorum:
  Options:

편집 2

실행하면 crm_simulate -sL다음과 같은 결과가 나타납니다.

[root@server_a ~]# crm_simulate -sL

    Current cluster status:
    Online: [ server_a.test.local server_d.test.local ]

     Resource Group: my_app
         ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_a.test.local
         Apache     (systemd:httpd):        Started server_a.test.local
         stunnel    (systemd:stunnel-my_app): Started server_a.test.local
         my_app-daemon        (systemd:my_app): Started server_a.test.local

    Allocation scores:
    group_color: my_app allocation score on server_a.test.local: 0
    group_color: my_app allocation score on server_d.test.local: 0
    group_color: ClusterIP allocation score on server_a.test.local: 0
    group_color: ClusterIP allocation score on server_d.test.local: INFINITY
    group_color: Apache allocation score on server_a.test.local: 0
    group_color: Apache allocation score on server_d.test.local: INFINITY
    group_color: stunnel allocation score on server_a.test.local: INFINITY
    group_color: stunnel allocation score on server_d.test.local: 0
    group_color: my_app-daemon allocation score on server_a.test.local: INFINITY
    group_color: my_app-daemon allocation score on server_d.test.local: 0
    native_color: ClusterIP allocation score on server_a.test.local: INFINITY
    native_color: ClusterIP allocation score on server_d.test.local: INFINITY
    native_color: Apache allocation score on server_a.test.local: INFINITY
    native_color: Apache allocation score on server_d.test.local: -INFINITY
    native_color: stunnel allocation score on server_a.test.local: INFINITY
    native_color: stunnel allocation score on server_d.test.local: -INFINITY
    native_color: my_app-daemon allocation score on server_a.test.local: INFINITY
    native_color: my_app-daemon allocation score on server_d.test.local: -INFINITY

    Transition Summary:

다음으로 모든 리소스를 제거하고 다시 추가했으며(이전과 마찬가지로 기록했습니다) crm_simulate -sL이제 명령을 실행할 때 다른 결과를 얻습니다.

[root@server_a ~]# crm_simulate -sL

Current cluster status:
Online: [ server_a.test.local server_d.test.local ]

 Resource Group: my_app
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_a.test.local
     Apache     (systemd:httpd):        Started server_a.test.local
     stunnel    (systemd:stunnel-my_app.service): Started server_a.test.local
     my_app-daemon        (systemd:my_app.service): Started server_a.test.local

Allocation scores:
group_color: my_app allocation score on server_a.test.local: 0
group_color: my_app allocation score on server_d.test.local: 0
group_color: ClusterIP allocation score on server_a.test.local: 0
group_color: ClusterIP allocation score on server_d.test.local: 0
group_color: Apache allocation score on server_a.test.local: 0
group_color: Apache allocation score on server_d.test.local: 0
group_color: stunnel allocation score on server_a.test.local: 0
group_color: stunnel allocation score on server_d.test.local: 0
group_color: my_app-daemon allocation score on server_a.test.local: 0
group_color: my_app-daemon allocation score on server_d.test.local: 0
native_color: ClusterIP allocation score on server_a.test.local: 0
native_color: ClusterIP allocation score on server_d.test.local: 0
native_color: Apache allocation score on server_a.test.local: 0
native_color: Apache allocation score on server_d.test.local: -INFINITY
native_color: stunnel allocation score on server_a.test.local: 0
native_color: stunnel allocation score on server_d.test.local: -INFINITY
native_color: my_app-daemon allocation score on server_a.test.local: 0
native_color: my_app-daemon allocation score on server_d.test.local: -INFINITY

리소스를 이동할 수 있지만 그렇게 하고 crm_simulate -sL명령을 다시 실행하면 이전과 다른 출력이 표시됩니다!

[root@server_a ~]# crm_simulate -sL

Current cluster status:
Online: [ server_a.test.local server_d.test.local ]

 Resource Group: my_app
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_d.test.local
     Apache     (systemd:httpd):        Started server_d.test.local
     stunnel    (systemd:stunnel-my_app.service): Started server_d.test.local
     my_app-daemon        (systemd:my_app.service): Started server_d.test.local

Allocation scores:
group_color: my_app allocation score on server_a.test.local: 0
group_color: my_app allocation score on server_d.test.local: 0
group_color: ClusterIP allocation score on server_a.test.local: 0
group_color: ClusterIP allocation score on server_d.test.local: INFINITY
group_color: Apache allocation score on server_a.test.local: 0
group_color: Apache allocation score on server_d.test.local: 0
group_color: stunnel allocation score on server_a.test.local: 0
group_color: stunnel allocation score on server_d.test.local: 0
group_color: my_app-daemon allocation score on server_a.test.local: 0
group_color: my_app-daemon allocation score on server_d.test.local: 0
native_color: ClusterIP allocation score on server_a.test.local: 0
native_color: ClusterIP allocation score on server_d.test.local: INFINITY
native_color: Apache allocation score on server_a.test.local: -INFINITY
native_color: Apache allocation score on server_d.test.local: 0
native_color: stunnel allocation score on server_a.test.local: -INFINITY
native_color: stunnel allocation score on server_d.test.local: 0
native_color: my_app-daemon allocation score on server_a.test.local: -INFINITY
native_color: my_app-daemon allocation score on server_d.test.local: 0

Transition Summary:

조금 혼란스럽습니다./ 이것이 예상된 동작입니까?

답변1

내 마지막 답변이 올바른지 확실하지 않지만 자세히 살펴 본 후 다음을 man pcs발견했습니다.

move [대상 노드] [--master] [lifetime=] [--wait[=n]] 노드를 비활성화하는 -INFINITY 위치 제약 조건을 생성하여 현재 실행 중인 노드 밖으로 리소스를 이동합니다. 대상 노드가 지정되면 대상 노드를 우선적으로 선택하기 위해 INFINITY 위치 제약 조건을 생성하여 해당 노드로 자원을 이동합니다. --master를 사용하는 경우 명령의 범위는 마스터 역할로 제한되며 마스터 ID(리소스 ID가 아님)를 사용해야 합니다. 수명이 지정되면 제약 조건은 해당 시간 이후에 만료됩니다. 그렇지 않으면 기본값은 무한대로 설정되며 "pcs 리소스 정리" 또는 "pcs 제약 조건 제거"를 사용하여 제약 조건을 수동으로 지울 수 있습니다. --wait가 지정되면 PC는 리소스가 이동할 때까지 최대 "n"초를 기다린 후 성공 시 0을, 오류 시 1을 반환합니다. 'n'을 지정하지 않으면 기본값은 60분입니다. 특정 노드에서 실행되는 것을 피하는 것이 가장 좋지만 해당 노드로 장애 조치할 수 있는 리소스를 원하는 경우 "pcs locationvoids"를 사용하세요.

를 사용하면 pcs resource clear제한이 해제되고 리소스를 이동할 수 있습니다.

답변2

그룹화된 모든 리소스에 대한 기본 설정 제한이 score:INFINITY문제일 수 있습니다. Pacemaker의 INFINITY경우와 실질적으로 동일하며 1,000,000점수에 할당할 수 있는 가장 높은 값입니다.

사용 시 다음 사항이 적용됩니다 INFINITY(ClusterLabs 설명서 참조).

6.1.1. Infinity Math 
  Pacemaker implements INFINITY (or equivalently, +INFINITY) 
  internally as a score of 1,000,000. Addition and subtraction 
  with it follow these three basic rules: 

  Any value + INFINITY =  INFINITY 
  Any value - INFINITY = -INFINITY 
  INFINITY  - INFINITY = -INFINITY

1,000선호도 점수를 , 또는 10,000등 으로 변경한 INFINITY후 테스트를 다시 실행해 보세요.

답변1

답변2

관련 정보