OpenCL에 사용 가능한 GPU 메모리 크기를 얻는 방법은 무엇입니까?

OpenCL에 사용 가능한 GPU 메모리 크기를 얻는 방법은 무엇입니까?

다음과 같이 OpenCL을 사용하여 계산을 수행하는 프로그램에서 사용할 수 있는 GPU의 메모리 크기를 가져오는 방법어두운 테이블?

몇 가지 일반적인 정보가 제공된다는 것을 알고 있지만 lspci제가 찾고 있는 정보는 아닙니다.

$ sudo lspci -v -s 01:00.0
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT [Radeon R9 270X] (prog-if 00 [VGA controller])
    Subsystem: Gigabyte Technology Co., Ltd Device 227d
    Flags: bus master, fast devsel, latency 0, IRQ 49
    Memory at d0000000 (64-bit, prefetchable) [size=256M]
    Memory at fe780000 (64-bit, non-prefetchable) [size=256K]
    I/O ports at c000 [size=256]
    Expansion ROM at fe7c0000 [disabled] [size=128K]
    Capabilities: [48] Vendor Specific Information: Len=08 <?>
    Capabilities: [50] Power Management version 3
    Capabilities: [58] Express Legacy Endpoint, MSI 00
    Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
    Capabilities: [150] Advanced Error Reporting
    Capabilities: [270] #19
    Capabilities: [2b0] Address Translation Service (ATS)
    Capabilities: [2c0] #13
    Capabilities: [2d0] #1b
    Kernel driver in use: fglrx_pci

256MB가 표시되는데, 이는 다크테이블이 OpenCL과 함께 작동하고 최소 768MB가 필요하기 때문에 비현실적이고 너무 적은 것입니다(GPU의 총 메모리는 4GB).

clinfo그런 다음 다음을 제공하는 (clinfo 패키지) 가 있습니다 .

Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 1.2 AMD-APP (1411.4)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa 


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               2
  Device Type:                   CL_DEVICE_TYPE_GPU
  Device ID:                     4098
  Board name:                    AMD Radeon R9 200 Series
  Device Topology:               PCI[ B#1, D#0, F#0 ]
  Max compute units:                 20
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1100Mhz
  Address bits:                  32
  Max memory allocation:             1073741824
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                3221225472
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x00007fce5d932500
  Name:                      Pitcairn
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1411.4 (VM)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1411.4)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir 


  Device Type:                   CL_DEVICE_TYPE_CPU
  Device ID:                     4098
  Board name:                    
  Max compute units:                 2
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               1024
  Max work group size:               1024
  Preferred vector width char:           16
  Preferred vector width short:          8
  Preferred vector width int:            4
  Preferred vector width long:           2
  Preferred vector width float:          4
  Preferred vector width double:         2
  Native vector width char:          16
  Native vector width short:             8
  Native vector width int:           4
  Native vector width long:          2
  Native vector width float:             4
  Native vector width double:            2
  Max clock frequency:               2664Mhz
  Address bits:                  64
  Max memory allocation:             2147483648
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                8192
  Max image 2D height:               8192
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           4096
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    32768
  Global memory size:                6258630656
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Global
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     1
  Error correction support:          0
  Unified memory for Host and Device:        1
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             Yes
  Queue properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x00007fce5d932500
  Name:                      Intel(R) Core(TM)2 Duo CPU     E6750  @ 2.66GHz
  Vendor:                    GenuineIntel
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1411.4 (sse2)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1411.4)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm 

이름에 메모리가 포함된 값이 있는데, 그 중에서 사용 가능한 총 메모리 양은 무엇입니까? 어느 단위에서?전역 메모리 크기512MB 비트이고,최대 메모리 할당단위는 256MB입니다.로컬 메모리 크기아마도 4GB(MB)일 것입니다. clinfo에는 맨페이지나 내장 도움말이 없습니다 -h.

사용 가능한 GPU 메모리 양을 얻기 위해 이러한 모든 값을 올바르게 해석하는 방법은 무엇입니까? 사용할 수 있는 다른 프로그램이 있나요?

또한: 아직 OpenCL에 대한 태그가 없는 이유는 무엇입니까?

답변1

지금쯤 답을 얻었을 수도 있지만 출력은 clinfo비트가 아닌 바이트 단위입니다. 따라서 전역 메모리 크기는 512MB가 아닌 약 3GB입니다.

답변2

그렇다면 이 정보를 얻으려면 일반적인 Linux 전역 스크립트와 같은 유틸리티가 필요합니까? 그런 구체적인 정보를 얻기가 쉽지 않은 것 같아요. 저는 clinfo 패키지에 익숙하지 않습니다. sudo apt-get install 패키지를 사용해야 할 것 같습니다.

일반적일 필요가 없다면 OpenCL 애플리케이션을 작성하여 이 정보를 얻을 수 있기 때문입니다. 저는 OpenCL이 이와 같은 정보를 제공할 수 있는 방법이 있어야 한다고 믿습니다. OpenCL 컨텍스트와 printfs GPU_MEMORY(또는 이와 유사한 것)를 콘솔에 초기화하는 간단한 애플리케이션을 작성하기만 하면 됩니다.

OpenCL 태그의 경우 운이 더 좋을 것 같습니다.https://stackoverflow.com/questions/tagged/opencl

관련 정보