Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GMKtec NucBox G3 Plus (N150) #64

Open
geerlingguy opened this issue Jan 24, 2025 · 24 comments
Open

GMKtec NucBox G3 Plus (N150) #64

geerlingguy opened this issue Jan 24, 2025 · 24 comments

Comments

@geerlingguy
Copy link
Owner

geerlingguy commented Jan 24, 2025

Image

Basic information

  • Board URL (official): https://amzn.to/40MbVXQ
  • Board purchased from: Provided for review
  • Board purchase date: January 20, 2025
  • Board specs (as tested): N150, 16GB RAM, 512 GB SSD
  • Board price (as tested): $159.99

Linux/system information

# output of `screenfetch`
                          ./+o+-       jgeerling@nucbox-g3-plus
                  yyyyy- -yyyyyy+      OS: Ubuntu 24.04 noble
               ://+//////-yyyyyyo      Kernel: x86_64 Linux 6.12.3-061203-generic
           .++ .:/++++++/-.+sss/`      Uptime: 16h 17m
         .:++o:  /++++++++/:--:/-      Packages: 1736
        o:+o+:++.`..```.-/oo+++++/     Shell: dash
       .:+o:+o/.          `+sssoo+/    Disk: 59G / 938G (7%)
  .++/+:+oo+o:`             /sssooo.   CPU: Intel N150 @ 4x 3.6GHz [41.0°C]
 /+++//+:`oo+o               /::--:.   GPU: Intel Corporation Alder Lake-N [Intel Graphics]
 \+/+o+++`o++o               ++////.   RAM: 990MiB / 15736MiB
  .++.o+++oo+:`             /dddhhh.  
       .+.o+oo:.          `oddhhhh+   
        \+.++o+o``-````.:ohdhhhhh+    
         `:o+++ `ohhhhhhhhyo++os:     
           .o:`.syhhhhhhh/.oo++o`     
               /osyyyyyyo++ooo+++/    
                   ````` +oo+++o\:    
                          `oo++.      

# output of `uname -a`
Linux nucbox-g3-plus 6.12.3-061203-generic #202412060638 SMP PREEMPT_DYNAMIC Fri Dec  6 07:08:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Benchmark results

CPU

NOTE: I have re-run all my benchmarks after re-pasting the CPU and removing the top cover and blowing a fan over it. Despite this being a new unit delivered by Amazon, the stock thermal paste was a bit dried out, and the top of the enclosure has no airflow whatsoever. A re-paste alone got the SoC temperatures 3-5°C cooler, and the fan kept the SoC temps down another 10°C or so. Also, the factory default is the 'Balanced' power profile, so I set it to 'High Performance' in the BIOS settings.

Power

  • Idle power draw (at wall): 9.3 W
  • Maximum simulated power draw (stress-ng --matrix 0): 25 W
  • During Geekbench multicore benchmark: 26.3 W
  • During top500 HPL benchmark: 28.5 W

Disk

TeamGroup TM8FP4001T Gen 3x4 2280 NVMe SSD

Benchmark Result
iozone 4K random read 57.77 MB/s
iozone 4K random write 209.61 MB/s
iozone 1M random read 1263.32 MB/s
iozone 1M random write 1435.38 MB/s
iozone 1M sequential read 1572.33 MB/s
iozone 1M sequential write 1429.21 MB/s

Note: This is testing with an SSD I placed in the system, as I'm preserving the original SSD, an AirDisk APF10-512G PCIe 3.0x4 M.2 2280 SSD, with it's original Windows 11 install.

Network

iperf3 results:

2.5 Gbps Ethernet (Intel I226-V)

  • iperf3 -c $SERVER_IP: 2.35 Gbps
  • iperf3 -c $SERVER_IP --reverse: 1.40 Gbps
  • iperf3 -c $SERVER_IP --bidir: 2.35 Gbps up, 577 Mbps down

WiFi 6E (Realtek RTL8852BE)

  • iperf3 -c $SERVER_IP: 883 Mbps
  • iperf3 -c $SERVER_IP --reverse: 320 Mbps
  • iperf3 -c $SERVER_IP --bidir: 779 Mbps up, 71 Mbps down

(Be sure to test all interfaces, noting any that are non-functional.)

GPU

glmark2

glmark2-es2 results:

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      Intel
    GL_RENDERER:    Mesa Intel(R) Graphics (ADL-N)
    GL_VERSION:     OpenGL ES 3.2 Mesa 24.3.0.20240801-2119~24.04 (git-9fc8668b66)
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 3392 FrameTime: 0.295 ms
[build] use-vbo=true: FPS: 3657 FrameTime: 0.273 ms
[texture] texture-filter=nearest: FPS: 3514 FrameTime: 0.285 ms
[texture] texture-filter=linear: FPS: 3504 FrameTime: 0.285 ms
[texture] texture-filter=mipmap: FPS: 3521 FrameTime: 0.284 ms
[shading] shading=gouraud: FPS: 3093 FrameTime: 0.323 ms
[shading] shading=blinn-phong-inf: FPS: 3110 FrameTime: 0.322 ms
[shading] shading=phong: FPS: 2730 FrameTime: 0.366 ms
[shading] shading=cel: FPS: 2697 FrameTime: 0.371 ms
[bump] bump-render=high-poly: FPS: 2078 FrameTime: 0.481 ms
[bump] bump-render=normals: FPS: 3716 FrameTime: 0.269 ms
[bump] bump-render=height: FPS: 3602 FrameTime: 0.278 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 2312 FrameTime: 0.433 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 1322 FrameTime: 0.756 ms
[pulsar] light=false:quads=5:texture=false: FPS: 3114 FrameTime: 0.321 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 1187 FrameTime: 0.843 ms
[desktop] effect=shadow:windows=4: FPS: 1966 FrameTime: 0.509 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 1148 FrameTime: 0.871 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 1351 FrameTime: 0.740 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 1328 FrameTime: 0.753 ms
[ideas] speed=duration: FPS: 2550 FrameTime: 0.392 ms
[jellyfish] <default>: FPS: 1821 FrameTime: 0.549 ms
[terrain] <default>: FPS: 247 FrameTime: 4.059 ms
[shadow] <default>: FPS: 2513 FrameTime: 0.398 ms
[refract] <default>: FPS: 547 FrameTime: 1.831 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 2854 FrameTime: 0.350 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 2839 FrameTime: 0.352 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 2850 FrameTime: 0.351 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 2845 FrameTime: 0.352 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 2848 FrameTime: 0.351 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2845 FrameTime: 0.352 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2845 FrameTime: 0.352 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2837 FrameTime: 0.352 ms
=======================================================
                                  glmark2 Score: 2507 
=======================================================

GravityMark

GravityMark results: https://gravitymark.tellusim.com/report/?id=577dbdbb0e62d5b49e4976b475737c40c8dd3226

Image

Note: These benchmarks require an active display on the device. Not all devices may be able to run glmark2-es2, so in that case, make a note and move on!

Ollama

ollama LLM model inference results:

'Balanced' power mode

System CPU/GPU Model Eval Rate Power (Peak)
GMKtek G3 Plus (Intel N150) - 16GB CPU deepseek-r1:1.5b 17.02 Tokens/s 25.6 W
GMKtek G3 Plus (Intel N150) - 16GB CPU deepseek-r1:8b 3.55 Tokens/s 25.6 W
GMKtek G3 Plus (Intel N150) - 16GB CPU deepseek-r1:14b 1.97 Tokens/s 25.6 W
GMKtec G3 Plus (Intel N150) - 16GB CPU llama3.2:3b 8.04 Tokens/s 26.5 W
GMKtec G3 Plus (Intel N150) - 16GB CPU llama3.1:8b 3.59 Tokens/s 26.5 W
GMKtec G3 Plus (Intel N150) - 16GB CPU llama2:13b 2.16 Tokens/s 26.5 W

'High Performance' power mode

System CPU/GPU Model Eval Rate Power (Peak)
GMKtek G3 Plus (Intel N150) - 16GB CPU deepseek-r1:1.5b 17.99 Tokens/s 29.9 W
GMKtek G3 Plus (Intel N150) - 16GB CPU deepseek-r1:8b 3.84 Tokens/s 29.8 W
GMKtek G3 Plus (Intel N150) - 16GB CPU deepseek-r1:14b 2.13 Tokens/s 30.3 W
GMKtec G3 Plus (Intel N150) - 16GB CPU llama3.2:3b 9.06 Tokens/s 26.4 W
GMKtec G3 Plus (Intel N150) - 16GB CPU llama3.1:8b 3.91 Tokens/s 29.8 W
GMKtec G3 Plus (Intel N150) - 16GB CPU llama2:13b 2.57 Tokens/s 28.5 W

Memory

tinymembench results:

Click to expand memory benchmark result
tinymembench v0.4.10 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :   6256.2 MB/s (0.5%)
 C copy backwards (32 byte blocks)                    :   6290.1 MB/s (0.4%)
 C copy backwards (64 byte blocks)                    :   6290.3 MB/s (0.5%)
 C copy                                               :   6252.0 MB/s (0.5%)
 C copy prefetched (32 bytes step)                    :   4869.8 MB/s (0.3%)
 C copy prefetched (64 bytes step)                    :   4955.0 MB/s
 C 2-pass copy                                        :   5904.3 MB/s (0.4%)
 C 2-pass copy prefetched (32 bytes step)             :   3758.0 MB/s (0.2%)
 C 2-pass copy prefetched (64 bytes step)             :   3761.1 MB/s (0.2%)
 C fill                                               :   8575.4 MB/s (0.4%)
 C fill (shuffle within 16 byte blocks)               :   8574.9 MB/s (0.2%)
 C fill (shuffle within 32 byte blocks)               :   8572.6 MB/s (1.1%)
 C fill (shuffle within 64 byte blocks)               :   8573.3 MB/s (0.2%)
 ---
 standard memcpy                                      :   9567.4 MB/s (0.4%)
 standard memset                                      :   8662.9 MB/s (1.1%)
 ---
 MOVSB copy                                           :   6352.6 MB/s (0.2%)
 MOVSD copy                                           :   6352.5 MB/s (0.2%)
 SSE2 copy                                            :   6358.2 MB/s (0.2%)
 SSE2 nontemporal copy                                :  10207.6 MB/s (0.7%)
 SSE2 copy prefetched (32 bytes step)                 :   5838.8 MB/s (0.2%)
 SSE2 copy prefetched (64 bytes step)                 :   5924.1 MB/s (0.2%)
 SSE2 nontemporal copy prefetched (32 bytes step)     :   7930.0 MB/s (0.3%)
 SSE2 nontemporal copy prefetched (64 bytes step)     :   8195.8 MB/s (0.8%)
 SSE2 2-pass copy                                     :   5755.2 MB/s (0.2%)
 SSE2 2-pass copy prefetched (32 bytes step)          :   4688.9 MB/s (0.2%)
 SSE2 2-pass copy prefetched (64 bytes step)          :   4899.0 MB/s (0.2%)
 SSE2 2-pass nontemporal copy                         :   3621.2 MB/s (0.2%)
 SSE2 fill                                            :   8659.9 MB/s (0.3%)
 SSE2 nontemporal fill                                :  20676.0 MB/s (0.4%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    2.4 ns          /     3.5 ns 
    131072 :    3.6 ns          /     4.4 ns 
    262144 :    4.8 ns          /     5.7 ns 
    524288 :    6.1 ns          /     6.8 ns 
   1048576 :    6.7 ns          /     7.2 ns 
   2097152 :    8.3 ns          /     9.5 ns 
   4194304 :   13.8 ns          /    17.5 ns 
   8388608 :   29.0 ns          /    41.7 ns 
  16777216 :   67.0 ns          /    94.3 ns 
  33554432 :   90.5 ns          /   114.9 ns 
  67108864 :  103.1 ns          /   122.8 ns 

block size : single random read / dual random read, [MADV_HUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.1 ns 
     65536 :    2.4 ns          /     3.6 ns 
    131072 :    3.6 ns          /     4.4 ns 
    262144 :    4.2 ns          /     4.7 ns 
    524288 :    4.5 ns          /     4.7 ns 
   1048576 :    4.6 ns          /     4.7 ns 
   2097152 :    5.0 ns          /     5.1 ns 
   4194304 :   10.6 ns          /    13.6 ns 
   8388608 :   23.9 ns          /    34.8 ns 
  16777216 :   59.9 ns          /    84.2 ns 
  33554432 :   80.3 ns          /   100.5 ns 
  67108864 :   90.6 ns          /   105.2 ns

sbc-bench results

Run sbc-bench and paste a link to the results here:

Phoronix Test Suite

Results from pi-general-benchmark.sh:

  • pts/encode-mp3: 8.928 sec
  • pts/x264 4K: 7.05 fps
  • pts/x264 1080p: 29.51 fps
  • pts/phpbench: 755754
  • pts/build-linux-kernel (defconfig): 500.745 sec
@geerlingguy
Copy link
Owner Author

Right out of the box, I ran into an issue—the included 12V 3A DC power adapter was dead:

Image

It would give between 0.14 and 0.24V, but the wall wart itself heated up a bit around the plug itself. So something inside was toast.

So instead, I've plugged in the original power adapter that I got with my N100 GMKtec. I tried using another standard 2.5mm barrel jack, but it seems the center pin on the GMKtec is ever so slightly thicker than the standard 2.5mm jacks.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Jan 31, 2025

I was having trouble getting GravityMark to run at all, and glmark2 was only giving me llvmpipe CPU rendering (very slow).

jgeerling@nucbox-g3-plus:~$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-N [Intel Graphics]

jgeerling@nucbox-g3-plus:~$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Mesa (0xffffffff)
    Device: llvmpipe (LLVM 19.1.1, 256 bits) (0xffffffff)
    Version: 24.2.8
    Accelerated: no
    Video memory: 15736MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.5
    Max compat profile version: 4.5
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
    VBO free memory - total: 0 MB, largest block: 0 MB
    VBO free aux. memory - total: 14409 MB, largest block: 14409 MB
    Texture free memory - total: 0 MB, largest block: 0 MB
    Texture free aux. memory - total: 14409 MB, largest block: 14409 MB
    Renderbuffer free memory - total: 0 MB, largest block: 0 MB
    Renderbuffer free aux. memory - total: 14409 MB, largest block: 14409 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 0 MB
    Total available memory: 15736 MB
    Currently available dedicated video memory: 0 MB
OpenGL vendor string: Mesa
OpenGL renderer string: llvmpipe (LLVM 19.1.1, 256 bits)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 24.2.8-1ubuntu1~24.04.1
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.5 (Compatibility Profile) Mesa 24.2.8-1ubuntu1~24.04.1
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

So I tried following Intel's Guide for Installing Client GPU's on Ubuntu 24.04 LTS:

# Install the Intel graphics GPG public key
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
  sudo gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg

# Configure the repositories.intel.com package repository
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu noble client" | \
  sudo tee /etc/apt/sources.list.d/intel-gpu-noble.list

# Update the package repository meta-data
sudo apt update

# Install the compute-related packages
sudo apt-get install -y libze-intel-gpu1 libze1 intel-opencl-icd clinfo intel-gsc

After a reboot... glxinfo still giving me llvmpipe :(

jgeerling@nucbox-g3-plus:~$ sudo apt install xserver-xorg-video-intel
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
xserver-xorg-video-intel is already the newest version (2:2.99.917+git20210115-1build1).
xserver-xorg-video-intel set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.

@geerlingguy
Copy link
Owner Author

Well then, found this on the reddit r/miniPCs subreddit: GMKtec G3 Plus with Intel N150, no graphics driver on Ubuntu 24.04.1?

# Adding for completeness - no driver loaded for graphics
jgeerling@nucbox-g3-plus:~$ lspci -nnk
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:461c]
	DeviceName: Onboard - Other
	Kernel driver in use: igen6_edac
	Kernel modules: igen6_edac
00:02.0 VGA compatible controller [0300]: Intel Corporation Alder Lake-N [Intel Graphics] [8086:46d4]
	DeviceName: Onboard - Video
	Subsystem: Device [0301:02f3]

The solution in that thread was to update from the 6.8 kernel to 6.12.x...

To select a newer kernel using a GUI:

sudo add-apt-repository ppa:cappelikan/ppa
sudo apt install -y mainline

Then open "Mainline Kernels", select the latest 6.12.x kernel and click "Install". Reboot, and then verify the running kernel with uname -a.

Now I'm getting full acceleration, with glmark2-es2 picking up Mesa Intel(R) Graphics (ADL-N). Re-testing GPU tests now!

@ThomasKaiser
Copy link

ThomasKaiser commented Feb 2, 2025

Please note that the cooling system is insufficient. With the CPU being faster on paper than N100 (which is confirmed by certain benchmarks that do not access memory) the memory controller gets severly throttled when running multi-threaded stuff like 7-zip.

Latency tested single-threaded is better than with an unthrottled N100 system, 7-zip single-threaded score also better but since the CPU gets too hot in multi-threaded benchmarks memory access gets thwarted:

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

N100:
Avr:             377   4038  15219  |              399   3269  13045
Tot:             388   3653  14132

N150:    
Avr:             388   2423   9389  |              397   2733  10839
Tot:             392   2578  10114

(N100 scores from here)

@geerlingguy
Copy link
Owner Author

@ThomasKaiser - "But N100 (and N150) is better and faster and more efficient than Raspberry Pi in every way!" they say, heh.

The fan does go to maximum any time I'm running benchmarks, though IIRC it is setting a thermal / voltage limit somewhere before thermal throttling, as the power spiked for a second or two (above 30W) then settled in at exactly 26.5W continuous for tests like LLMs (or very nearly the same).

I can dig back into the BIOS to see if there was some other limit, I didn't spend too much time in there yet (maybe a Turbo setting or Power Limit?).

@ThomasKaiser
Copy link

IIRC it is setting a thermal / voltage limit somewhere before thermal throttling

There's two types of thermal throttling with Intel CPUs now (at least with Alder Lake-N and Twin Lake):

  • CPU clockspeeds, that has happened from time to time with the 7-zip benchmark (2.7 instead of the 2.9 GHz that are common with N100/N150 running fully multi-threaded)
  • memory controller as in 'access to DRAM' which has happened likely the whole time during the 7-zip benchmark

The thermal threshold for thwarting DRAM access is (much?) lower than for the CPU clockspeeds which is why benchmark testers usually don't notice. But IIRC it was around 65°C at least with 1st Alder Lake-N generation. But with all these dynamic limitations around (also various types of power tresholds) it's hard to tell at which 'event' what happens exactly.

Given the high temperatures I would suspect there's something wrong with the fansink, e.g. no thermal paste / an air gap between CPU and cooler.

@geerlingguy
Copy link
Owner Author

Given the high temperatures I would suspect there's something wrong with the fansink, e.g. no thermal paste / an air gap between CPU and cooler.

I'll pop it and see, judging by the faulty PSU I imagine QC on the cheapest of mini PCs might not be as high as one would hope.

@geerlingguy
Copy link
Owner Author

Before - Factory Thermal Paste/Cooling

Running s-tui with sudo so I can see power consumption, it seems to nail 10.0W and 2800 MHz almost continuously, with the fan ramping up and down a bit and the SoC settling on around 75°C:

Image

Running my top500 benchmark, which does hit the memory at the same time as the CPU, the temperatures crept up more to 78°C or so, and core freqs dropped to 2400 MHz (with the same 10.0 W package power):

Image

Running 7z b -mm=* -mmt=* and letting it rip for a while, the temperature reaches up towards 85°C, while the power limit is still between 9.5-10.5W. But that does seem to lead to some throttling:

Image

@geerlingguy
Copy link
Owner Author

Repasted with Noctua NT-H2 - the original paste wasn't terrible but it also was a little drier than I'd like:

Image

@geerlingguy
Copy link
Owner Author

geerlingguy commented Feb 3, 2025

After Re-paste

Running s-tui with sudo so I can see power consumption, it seems to nail 10.0W and 2800 MHz almost continuously (as before), but the fan didn't ramping up and down quite as frequently. The SoC settled around 71°C, which is 4° cooler than before:

Image

Running my top500 benchmark, which does hit the memory at the same time as the CPU, the temperatures crept up to 75°C or so, and core freqs still dropped to 2300-2400 MHz (with the same 10.0 W package power):

Image

The HPL result was 54.621 Gflops at 23.1W for 2.36 Gflops/W — a marked improvement.

Running 7z b -mm=* -mmt=* and letting it rip for a while, the temperature never went above 80°C, while the power limit is still between 9.5-10.5W. I didn't encounter any throttling:

Image

So it seems @ThomasKaiser your conclusion was correct, the cooling was janky from the factory. Makes you wonder if the boxes were sitting baking in the sun on a dock somewhere, or if the thermal compound in use is just not that great?

The paste was definitely applied correctly, and there was no air gap that I could tell between the heatsink assembly and the SoC.

I'll also re-run sbc-bench on here as well, so you can take a look at all the fine details.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Feb 3, 2025

@ThomasKaiser -

Factory: https://0x0.st/88vb.bin
After re-paste: https://0x0.st/8K1h.bin

@geerlingguy
Copy link
Owner Author

Well now... after running through all my benchmarks multiple times, I'm getting results nearer to my original runs. And the system idle temp is a bit higher, around 55°C now.

I wonder if you leave this system running long enough, if the heat kind of spreads on the main PCB (affecting the RAM, since it's soldered on), and the fan can't adequately keep the entire enclosure cooled.

I should note the top side of the board has no cooling to speak of, it's just a tiny oven up there. The fan intake is on the bottom, and it exhausts directly out through the heatsink fins connected to the heat pipe, there's not a lot of opportunity for any other airflow inside the box.

I may run it completely bare, or at least with the top off and a fan blowing some air over the board, just to see if it idles at a more reasonable temperature.

@ThomasKaiser
Copy link

I'll also re-run sbc-bench on here as well, so you can take a look at all the fine details.

Thank you though numbers haven't changed much which is no wonder given that already idle temperature is nuts (and just a few degrees away from the memory controller starting to throttle).

Ripping the top cover off and simply watching idle temperature (sbc-bench -m for example) may be enough to justify another round of tests (if it's not dropping below 50°C most probably throttling will happen again under load)

@geerlingguy
Copy link
Owner Author

I re-ran the entire benchmark suite a couple more times, and performance is consistently lower now. This is with the system running (with 30+ minute idle periods between benches), and idle SoC temp is 46-50°C.

I will pop the top now and re-run everything with a fan blowing over the entire board, to see if it helps.

@geerlingguy
Copy link
Owner Author

After a couple minutes with the top open and my giant fan blowing into the top, the idle temp is now down to 35-37°C. Going to re-run everything now, starting with sbc-bench.

@geerlingguy
Copy link
Owner Author

@ThomasKaiser With adequate cooling: https://0x0.st/8Kyt.bin

@ThomasKaiser
Copy link

ThomasKaiser commented Feb 4, 2025

https://0x0.st/8Kyt.bin

A little bit better. Maybe now some power limit kicks in?

When looking at the 1st of the three 7-zip runs the very first try with smallest dictionary size is OK but then performance drops a lot, especially compression which is more memory-bound than decompression:

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:      14623   358   3977  14226  |     143936   398   3087  12280
23:      10977   385   2906  11185  |     130829   397   2852  11320
24:      10472   387   2912  11260  |     127071   397   2806  11155
25:       9968   387   2944  11382  |     122787   397   2754  10928
----------------------------------  | ------------------------------
Avr:             379   3185  12013  |              397   2875  11421
Tot:             388   3030  11717

The two subsequent runs performance stays on this low(er) level. In contrast the aforementioned N100 run:

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:      15289   368   4039  14873  |     154031   399   3295  13141
23:      14752   379   3971  15031  |     151200   399   3281  13083
24:      14208   380   4024  15277  |     148658   399   3267  13050
25:      13747   381   4117  15696  |     145035   399   3231  12908
----------------------------------  | ------------------------------
Avr:             377   4038  15219  |              399   3269  13045
Tot:             388   3653  14132

@geerlingguy
Copy link
Owner Author

geerlingguy commented Feb 4, 2025

@ThomasKaiser - Yeah, I do wonder if the memory (which is underneath the board, with not much airflow around it) (memory is on top silly me) could still be overheating maybe?

Image

@geerlingguy
Copy link
Owner Author

geerlingguy commented Feb 4, 2025

Gah... apparently at some point I reset the BIOS, because the 'Power Limit Select' was set to 'Balanced' instead of 'High Performance'.

It seems like the power limit is still 10W, and frequency still caps at 2.9 GHz on all cores running stress through sudo s-tui.

Re-running sbc-bench again with 'High Performance' selected and the fan directly on top of the chassis so it will hopefully force more air around the edges of the board down into the underside area.

Image

@geerlingguy
Copy link
Owner Author

geerlingguy commented Feb 4, 2025

Another run with 'High Performance' mode: https://0x0.st/8Kwt.bin

jgeerling@nucbox-g3-plus:~/Downloads$ sudo powercap-info -p intel-rapl
enabled: 1
Zone 0
  name: package-0
  enabled: 1
  max_energy_range_uj: 262143328850
  energy_uj: 6697232488
  Constraint 0
    name: long_term
    power_limit_uw: 15000000
    time_window_us: 7995392
    max_power_uw: 6000000
  Constraint 1
    name: short_term
    power_limit_uw: 15000000
    time_window_us: 2440
    max_power_uw: 0
  Constraint 2
    name: peak_power
    power_limit_uw: 78000000
    max_power_uw: 0
  Zone 0:0
    name: core
    enabled: 0
    max_energy_range_uj: 262143328850
    energy_uj: 6465136708
    Constraint 0
      name: long_term
      power_limit_uw: 0
      time_window_us: 976
  Zone 0:1
    name: uncore
    enabled: 0
    max_energy_range_uj: 262143328850
    energy_uj: 1695369
    Constraint 0
      name: long_term
      power_limit_uw: 0
      time_window_us: 976

@ThomasKaiser
Copy link

Another run with 'High Performance' mode: https://0x0.st/8Kwt.bin

This should be now the 'real numbers'. 7-zip scores are much better and consistent though still lower than the N100 setup I referenced here multiple times. But turns out... the box you're currently testing uses a single DDR4 SO-DIMM with 3200MT/s vs. LPDDR5 at 4800 MT/s.

That should eventually explain why this setup with the slightly faster N150 CPU is outperformed by the older Alder Lake-N N100 machine since the latter while showing worse latency when measured single-threaded allows faster access to DRAM when running fully multi-threaded.

So it's safe to say you tested a real SBC (seriously bad computer ;) )

@geerlingguy
Copy link
Owner Author

@ThomasKaiser - Noted. And I just ordered a 16GB Corsair DDR5 4800 MHz stick to re-test with it, should be here tomorrow.

@geerlingguy
Copy link
Owner Author

...and it took me about 2 seconds to realize that like on desktop, DDR5 SO-DIMMs are physically incompatible with DDR4 SO-DIMM slots.

Image

So I won't be testing this system with DDR5 after all! TIL, I knew there was a difference on desktop, but this is the first non-desktop system I've tested that has a CPU that could be capable of DDR5.

@ThomasKaiser
Copy link

ThomasKaiser commented Feb 5, 2025

DDR5 SO-DIMMs are physically incompatible with DDR4 SO-DIMM slots

Oh well, I would've ordered one as well since being trapped in Apple's walled garden and the SBC world I may have not touched a [SO-]DIMM within the last decade :)

Edit: must've been 14 years. I just rembered that I replaced a HDD with an SSD in a MacBookPro8,2 (Late 2011) and back then there were SO-DIMMs, the next MacBook I bought ten years ago came already with DDR3L.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants