다음과 같이 OpenCL을 사용하여 계산을 수행하는 프로그램에서 사용할 수 있는 GPU의 메모리 크기를 가져오는 방법어두운 테이블?
몇 가지 일반적인 정보가 제공된다는 것을 알고 있지만 lspci
제가 찾고 있는 정보는 아닙니다.
$ sudo lspci -v -s 01:00.0
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT [Radeon R9 270X] (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd Device 227d
Flags: bus master, fast devsel, latency 0, IRQ 49
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at fe780000 (64-bit, non-prefetchable) [size=256K]
I/O ports at c000 [size=256]
Expansion ROM at fe7c0000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [270] #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] #13
Capabilities: [2d0] #1b
Kernel driver in use: fglrx_pci
256MB가 표시되는데, 이는 다크테이블이 OpenCL과 함께 작동하고 최소 768MB가 필요하기 때문에 비현실적이고 너무 적은 것입니다(GPU의 총 메모리는 4GB).
clinfo
그런 다음 다음을 제공하는 (clinfo 패키지) 가 있습니다 .
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 AMD-APP (1411.4)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Board name: AMD Radeon R9 200 Series
Device Topology: PCI[ B#1, D#0, F#0 ]
Max compute units: 20
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1100Mhz
Address bits: 32
Max memory allocation: 1073741824
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 3221225472
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x00007fce5d932500
Name: Pitcairn
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 1411.4 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1411.4)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Board name:
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 2
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 2
Max clock frequency: 2664Mhz
Address bits: 64
Max memory allocation: 2147483648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 6258630656
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x00007fce5d932500
Name: Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz
Vendor: GenuineIntel
Device OpenCL C version: OpenCL C 1.2
Driver version: 1411.4 (sse2)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1411.4)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm
이름에 메모리가 포함된 값이 있는데, 그 중에서 사용 가능한 총 메모리 양은 무엇입니까? 어느 단위에서?전역 메모리 크기512MB 비트이고,최대 메모리 할당단위는 256MB입니다.로컬 메모리 크기아마도 4GB(MB)일 것입니다. clinfo에는 맨페이지나 내장 도움말이 없습니다 -h
.
사용 가능한 GPU 메모리 양을 얻기 위해 이러한 모든 값을 올바르게 해석하는 방법은 무엇입니까? 사용할 수 있는 다른 프로그램이 있나요?
또한: 아직 OpenCL에 대한 태그가 없는 이유는 무엇입니까?
답변1
지금쯤 답을 얻었을 수도 있지만 출력은 clinfo
비트가 아닌 바이트 단위입니다. 따라서 전역 메모리 크기는 512MB가 아닌 약 3GB입니다.
답변2
그렇다면 이 정보를 얻으려면 일반적인 Linux 전역 스크립트와 같은 유틸리티가 필요합니까? 그런 구체적인 정보를 얻기가 쉽지 않은 것 같아요. 저는 clinfo 패키지에 익숙하지 않습니다. sudo apt-get install 패키지를 사용해야 할 것 같습니다.
일반적일 필요가 없다면 OpenCL 애플리케이션을 작성하여 이 정보를 얻을 수 있기 때문입니다. 저는 OpenCL이 이와 같은 정보를 제공할 수 있는 방법이 있어야 한다고 믿습니다. OpenCL 컨텍스트와 printfs GPU_MEMORY(또는 이와 유사한 것)를 콘솔에 초기화하는 간단한 애플리케이션을 작성하기만 하면 됩니다.
OpenCL 태그의 경우 운이 더 좋을 것 같습니다.https://stackoverflow.com/questions/tagged/opencl