호스트의 tmpfs에서 Xeon Phi 스왑

2024-5-18 • tag-icon

Xeon Phi(mic0)에서 스왑을 사용하기 위해 호스트 시스템의 RAM을 사용하고 싶습니다.

호스트 머신에서:

# free -m
             total       used       free     shared    buffers     cached
Mem:        129022      60312      68710          0       1092      50078
-/+ buffers/cache:       9141     119880
Swap:            0          0          0

호스트 머신에서 다음 명령을 실행하세요.

# mount -t ramfs ramfs /mnt/ramfs/
# dd bs=512M if=/dev/zero of=/mnt/ramfs/ram1 count=48
# echo /mnt/ramfs/ram1 >/sys/class/mic/mic0/virtblk_file
# df -a | grep ramfs
/mnt/ramfs              0          0          0    - /mnt/ramfs
# vim  /etc/mpss/default.conf # add:
ExtraCommandLine "vfs_read_optimization=on" 
ExtraCommandLine "vfs_write_optimization=on"
# service mpss stop
# micctrl --resetconfig
# service mpss start

그런 다음 mic0에서 실행합니다.

# modprobe mic_virtblk 
# mkswap /dev/vda
# swapon /dev/vda
# free -m
             total       used       free     shared    buffers     cached
Mem:          7697        574       7123          0          0        145
-/+ buffers/cache:        428       7268
Swap:        24575          0      24575

스왑이 호스트의 RAM에 연결되어 있는지 어떻게 확인할 수 있나요?

호스트의 RAM에 이미 연결되어 있는 Xeon Phi에서 스왑을 사용할 때 왜 그렇게 느려지나요?

테스트 코드:

#include <stdlib.h>   
#include <stdio.h>
#include <string.h>
#include <time.h>

long timediff(clock_t t1, clock_t t2) {
    long elapsed;
    elapsed = ((double)t2 - t1) / CLOCKS_PER_SEC * 1000;
    return elapsed;
}

int main(int argc, char** argv) {
    clock_t t1, t2;
    int max = 100;
    int mb = 0;
    int size = 256;
    char* buffer;

    if(argc > 1)
        max = atoi(argv[1]);

    t1 = clock();
    while((buffer=malloc(size*1024*1024)) != NULL && mb != max) {
        memset(buffer, 0, size*1024*1024);
        ++mb;
        t2 = clock();
        printf("Allocated %.2f GB in %ld ms\n", mb * size / 1024.0, timediff(t1, t2) );
        t1 = t2;
    }      
return 0;
}

다음을 사용하여 컴파일:icc swaptest.c -o swaptest -mmic

결과:

# ./swaptest 
Allocated 0.25 GB in 260 ms
Allocated 0.50 GB in 269 ms
...
Allocated 6.75 GB in 269 ms
Allocated 7.00 GB in 260 ms
Allocated 7.25 GB in 470 ms
Allocated 7.50 GB in 1819 ms
Allocated 7.75 GB in 2060 ms
Allocated 8.00 GB in 2420 ms
Allocated 8.25 GB in 2820 ms
Allocated 8.50 GB in 2750 ms
Allocated 8.75 GB in 2300 ms
Allocated 9.00 GB in 1380 ms
Allocated 9.25 GB in 1530 ms
Allocated 9.50 GB in 3400 ms
Allocated 9.75 GB in 3800 ms
Allocated 10.00 GB in 3940 ms
Allocated 10.25 GB in 3579 ms
Allocated 10.50 GB in 5050 ms
Allocated 10.75 GB in 5029 ms
Allocated 11.00 GB in 5130 ms
Allocated 11.25 GB in 4770 ms
Allocated 11.50 GB in 3719 ms
Allocated 11.75 GB in 2300 ms
Allocated 12.00 GB in 3619 ms

등..

호스트 시스템과 비교:

$ ./a.out 
Allocated 0.25 GB in 140 ms
Allocated 0.50 GB in 170 ms
Allocated 0.75 GB in 160 ms
Allocated 1.00 GB in 160 ms
...
Allocated 23.75 GB in 130 ms
Allocated 24.00 GB in 130 ms
Allocated 24.25 GB in 130 ms
Allocated 24.50 GB in 130 ms
Allocated 24.75 GB in 130 ms
Allocated 25.00 GB in 120 ms

아래로 교환된 경우: 269ms에서 256MB ~951MB/s

스왑 사용 시: 5.13초에 256MB는 약 48.7MB/s로 위에 표시된 벤치마크보다 훨씬 느립니다.https://software.intel.com/en-us/blogs/2014/01/07/improving-file-io-performance-on-intel-xeon-phi(최소 360MB/s), 의도적인 것인가요?

Xeon Phi icc (ICC) 14.0.2 20140120와 함께 (parallel_studio_xe_2013_sp1_update2)를 사용하고 있습니다.mpss-3.2.15110P시리즈(11판)

관련 정보