Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:speed causes compiler to hang. #4924

GoldenbergDaniel · 2025-03-10T20:53:16Z

Context

Odin: dev-2025-03-nightly
OS: Ubuntu 24.10, Linux 6.11.0-19-generic
CPU: AMD Ryzen 7 6800H with Radeon Graphics
RAM: 14703 MiB
Backend: LLVM 18.1.6

Expected Behavior

Code should compile with reasonable compile times.

Current Behavior

Code takes 2-3x longer to compile normally and doesn't compile with -o:release.

Failure Information (for bugs)

The following code compiles normally when the line with slice.to_type is commented out. I am pretty sure this is related to the structure of the data being serialized.

package game

import "core:fmt"
import "core:os"
import "core:slice"

Game :: struct
{
  t:        f32,
  entities: [1024+1]Entity,
}

Entity :: struct
{
  idx:       u32,
  gen:       u32,
  flags:     bit_set[Entity_Flag],
  props:     bit_set[Entity_Prop],
  pos:       [2]f32,
  vel:       [2]f32,
  dim:       [2]f32,
  rot:       f32,
  input_dir: [2]f32,
  tint:      [4]f32,
  color:     [4]f32,
  sprite:    u32,
  z_index:   i16,
  z_layer:   enum u8
  {
    NIL,
    DECORATION,
    ENEMY,
    PLAYER,
    PROJECTILE,
  },
}

Entity_Flag :: enum u32
{
  ACTIVE,
  MARKED_FOR_DEATH,
  INTERPOLATE,
}

Entity_Prop :: enum u64
{
  WRAP_AT_WORLD_EDGES,
  LOOK_AT_TARGET,
}

main :: proc()
{
  gm: Game

  SAVE_PATH :: "bug_test"

  save_file, open_err := os.open(SAVE_PATH, os.O_RDWR)
  defer os.close(save_file)
  if open_err == nil
  {
    load_game_from_file(save_file, &gm)
  }
  else
  {
    fmt.eprintln("Error opening file for loading!", open_err)
  }
}

load_game_from_file :: proc(fd: os.Handle, gm: ^Game) -> bool
{
  saved_buf: [size_of(Game)*2]byte
  saved_len, _ := os.read(fd, saved_buf[:])
  gm_bytes := saved_buf[:saved_len]

  ok: bool
  gm^, ok = slice.to_type(gm_bytes, Game) // Try commenting this out
  if !ok
  {
    fmt.eprintln("Failed to get Game from bytes!")
    return false
  }

  fmt.println("Loaded game from disk.")

  return true
}

Steps to Reproduce

Please provide detailed steps for reproducing the issue.

If intending to run the code, make a file called "bug_test"
Build the code above with odin build . -o:speed -show-timings
Comment out the line that calls slice.to_type or make Entity an empty struct.
Build the code again using the same command.
Repeat for other optimization levels if necessary.

Failure Logs

N/A

The text was updated successfully, but these errors were encountered:

JesseRMeyer · 2025-03-10T21:18:06Z

Reproducible at godbolt: https://godbolt.org/z/s3c79c98r
In fact it takes so long goldbolt kills the compiler. Remove -o:size to see the resulting (and awful) code gen.

Seems to be related to the long standing issue around LLVM's treatment of large arrays.

laytan · 2025-03-12T19:40:12Z

Very weird, we generate this on -o:none

  %27 = call i8 @"slice::to_type:proc(buf:[]u8,T:$game::Game)->(:game::Game,:bool)"(<{ i64, i64 }> %26, ptr %8, ptr %__.context_ptr)
  %28 = load %"game::Game", ptr %8, align 4
  store %"game::Game" %28, ptr %1, align 4

and when optimisations are enabled LLVM turns this into thousands of instructions like:

  %.fca.0.load = load float, ptr %2, align 4
  %.fca.1.0.0.gep = getelementptr inbounds i8, ptr %2, i64 4
  %.fca.1.0.0.load = load i32, ptr %.fca.1.0.0.gep, align 4
  %.fca.1.0.1.gep = getelementptr inbounds i8, ptr %2, i64 8
  %.fca.1.0.1.load = load i32, ptr %.fca.1.0.1.gep, align 4
  %.fca.1.0.2.gep = getelementptr inbounds i8, ptr %2, i64 12
  %.fca.1.0.2.load = load i8, ptr %.fca.1.0.2.gep, align 4
  %.fca.1.0.3.gep = getelementptr inbounds i8, ptr %2, i64 13
  %.fca.1.0.3.load = load i8, ptr %.fca.1.0.3.gep, align 1
  %.fca.1.0.4.0.gep = getelementptr inbounds i8, ptr %2, i64 14
  %.fca.1.0.4.0.load = load i8, ptr %.fca.1.0.4.0.gep, align 2
  %.fca.1.0.4.1.gep = getelementptr inbounds i8, ptr %2, i64 15
  %.fca.1.0.4.1.load = load i8, ptr %.fca.1.0.4.1.gep, align 1
  %.fca.1.0.5.0.gep = getelementptr inbounds i8, ptr %2, i64 16
  %.fca.1.0.5.0.load = load float, ptr %.fca.1.0.5.0.gep, align 4
  %.fca.1.0.5.1.gep = getelementptr inbounds i8, ptr %2, i64 20
  %.fca.1.0.5.1.load = load float, ptr %.fca.1.0.5.1.gep, align 4
  %.fca.1.0.6.0.gep = getelementptr inbounds i8, ptr %2, i64 24
  %.fca.1.0.6.0.load = load float, ptr %.fca.1.0.6.0.gep, align 4
  %.fca.1.0.6.1.gep = getelementptr inbounds i8, ptr %2, i64 28
  %.fca.1.0.6.1.load = load float, ptr %.fca.1.0.6.1.gep, align 4
  %.fca.1.0.7.0.gep = getelementptr inbounds i8, ptr %2, i64 32
  %.fca.1.0.7.0.load = load float, ptr %.fca.1.0.7.0.gep, align 4
  %.fca.1.0.7.1.gep = getelementptr inbounds i8, ptr %2, i64 36
  %.fca.1.0.7.1.load = load float, ptr %.fca.1.0.7.1.gep, align 4
  %.fca.1.0.8.gep = getelementptr inbounds i8, ptr %2, i64 40
  %.fca.1.0.8.load = load float, ptr %.fca.1.0.8.gep, align 4
  %.fca.1.0.9.0.gep = getelementptr inbounds i8, ptr %2, i64 44
  %.fca.1.0.9.0.load = load float, ptr %.fca.1.0.9.0.gep, align 4
  %.fca.1.0.9.1.gep = getelementptr inbounds i8, ptr %2, i64 48
  %.fca.1.0.9.1.load = load float, ptr %.fca.1.0.9.1.gep, align 4
  %.fca.1.0.10.0.gep = getelementptr inbounds i8, ptr %2, i64 52
  %.fca.1.0.10.0.load = load float, ptr %.fca.1.0.10.0.gep, align 4
  %.fca.1.0.10.1.gep = getelementptr inbounds i8, ptr %2, i64 56
  %.fca.1.0.10.1.load = load float, ptr %.fca.1.0.10.1.gep, align 4
  %.fca.1.0.10.2.gep = getelementptr inbounds i8, ptr %2, i64 60
  %.fca.1.0.10.2.load = load float, ptr %.fca.1.0.10.2.gep, align 4
  %.fca.1.0.10.3.gep = getelementptr inbounds i8, ptr %2, i64 64
  %.fca.1.0.10.3.load = load float, ptr %.fca.1.0.10.3.gep, align 4
  %.fca.1.0.11.0.gep = getelementptr inbounds i8, ptr %2, i64 68
  %.fca.1.0.11.0.load = load float, ptr %.fca.1.0.11.0.gep, align 4
  %.fca.1.0.11.1.gep = getelementptr inbounds i8, ptr %2, i64 72
  %.fca.1.0.11.1.load = load float, ptr %.fca.1.0.11.1.gep, align 4
  %.fca.1.0.11.2.gep = getelementptr inbounds i8, ptr %2, i64 76
  %.fca.1.0.11.2.load = load float, ptr %.fca.1.0.11.2.gep, align 4
  %.fca.1.0.11.3.gep = getelementptr inbounds i8, ptr %2, i64 80
  %.fca.1.0.11.3.load = load float, ptr %.fca.1.0.11.3.gep, align 4
  %.fca.1.0.12.gep = getelementptr inbounds i8, ptr %2, i64 84
  %.fca.1.0.12.load = load i32, ptr %.fca.1.0.12.gep, align 4
  %.fca.1.0.13.gep = getelementptr inbounds i8, ptr %2, i64 88
  %.fca.1.0.13.load = load i16, ptr %.fca.1.0.13.gep, align 4
  %.fca.1.0.14.gep = getelementptr inbounds i8, ptr %2, i64 90
  %.fca.1.0.14.load = load i8, ptr %.fca.1.0.14.gep, align 2
  %.fca.1.0.15.gep = getelementptr inbounds i8, ptr %2, i64 91
  %.fca.1.0.15.load = load i8, ptr %.fca.1.0.15.gep, align 1
  %.fca.1.1.0.gep = getelementptr inbounds i8, ptr %2, i64 92
  %.fca.1.1.0.load = load i32, ptr %.fca.1.1.0.gep, align 4
  %.fca.1.1.1.gep = getelementptr inbounds i8, ptr %2, i64 96
  %.fca.1.1.1.load = load i32, ptr %.fca.1.1.1.gep, align 4
  %.fca.1.1.2.gep = getelementptr inbounds i8, ptr %2, i64 100
  %.fca.1.1.2.load = load i8, ptr %.fca.1.1.2.gep, align 4
  %.fca.1.1.3.gep = getelementptr inbounds i8, ptr %2, i64 101
  %.fca.1.1.3.load = load i8, ptr %.fca.1.1.3.gep, align 1
  %.fca.1.1.4.0.gep = getelementptr inbounds i8, ptr %2, i64 102
  %.fca.1.1.4.0.load = load i8, ptr %.fca.1.1.4.0.gep, align 2
  %.fca.1.1.4.1.gep = getelementptr inbounds i8, ptr %2, i64 103
  %.fca.1.1.4.1.load = load i8, ptr %.fca.1.1.4.1.gep, align 1
  %.fca.1.1.5.0.gep = getelementptr inbounds i8, ptr %2, i64 104
  %.fca.1.1.5.0.load = load float, ptr %.fca.1.1.5.0.gep, align 4
  %.fca.1.1.5.1.gep = getelementptr inbounds i8, ptr %2, i64 108
  %.fca.1.1.5.1.load = load float, ptr %.fca.1.1.5.1.gep, align 4
  %.fca.1.1.6.0.gep = getelementptr inbounds i8, ptr %2, i64 112
  %.fca.1.1.6.0.load = load float, ptr %.fca.1.1.6.0.gep, align 4
  %.fca.1.1.6.1.gep = getelementptr inbounds i8, ptr %2, i64 116
  %.fca.1.1.6.1.load = load float, ptr %.fca.1.1.6.1.gep, align 4
  %.fca.1.1.7.0.gep = getelementptr inbounds i8, ptr %2, i64 120
  %.fca.1.1.7.0.load = load float, ptr %.fca.1.1.7.0.gep, align 4
  %.fca.1.1.7.1.gep = getelementptr inbounds i8, ptr %2, i64 124
  %.fca.1.1.7.1.load = load float, ptr %.fca.1.1.7.1.gep, align 4
  %.fca.1.1.8.gep = getelementptr inbounds i8, ptr %2, i64 128
  %.fca.1.1.8.load = load float, ptr %.fca.1.1.8.gep, align 4
  %.fca.1.1.9.0.gep = getelementptr inbounds i8, ptr %2, i64 132
  %.fca.1.1.9.0.load = load float, ptr %.fca.1.1.9.0.gep, align 4
  %.fca.1.1.9.1.gep = getelementptr inbounds i8, ptr %2, i64 136
  %.fca.1.1.9.1.load = load float, ptr %.fca.1.1.9.1.gep, align 4
  %.fca.1.1.10.0.gep = getelementptr inbounds i8, ptr %2, i64 140
  %.fca.1.1.10.0.load = load float, ptr %.fca.1.1.10.0.gep, align 4
  %.fca.1.1.10.1.gep = getelementptr inbounds i8, ptr %2, i64 144
  %.fca.1.1.10.1.load = load float, ptr %.fca.1.1.10.1.gep, align 4
  %.fca.1.1.10.2.gep = getelementptr inbounds i8, ptr %2, i64 148
  %.fca.1.1.10.2.load = load float, ptr %.fca.1.1.10.2.gep, align 4
  %.fca.1.1.10.3.gep = getelementptr inbounds i8, ptr %2, i64 152
  %.fca.1.1.10.3.load = load float, ptr %.fca.1.1.10.3.gep, align 4
  %.fca.1.1.11.0.gep = getelementptr inbounds i8, ptr %2, i64 156
  %.fca.1.1.11.0.load = load float, ptr %.fca.1.1.11.0.gep, align 4
  %.fca.1.1.11.1.gep = getelementptr inbounds i8, ptr %2, i64 160
  %.fca.1.1.11.1.load = load float, ptr %.fca.1.1.11.1.gep, align 4
  %.fca.1.1.11.2.gep = getelementptr inbounds i8, ptr %2, i64 164
  %.fca.1.1.11.2.load = load float, ptr %.fca.1.1.11.2.gep, align 4
  %.fca.1.1.11.3.gep = getelementptr inbounds i8, ptr %2, i64 168
  %.fca.1.1.11.3.load = load float, ptr %.fca.1.1.11.3.gep, align 4

JesseRMeyer · 2025-03-12T19:48:40Z

Even with -o:none on godbolt the resulting assembly is a long series of various MOVs.

laytan · 2025-03-12T20:05:16Z

Hmm maybe store %"game::Game" %28, ptr %1, align 4 would always need to select instructions to move each field individually then, even though it's one line of IR. Then LLVM tries to optimize those into better instructions but that doesn't seem the problematic thing then. The problematic thing is it seemingly doing a mov for each field.

laytan · 2025-03-12T20:08:34Z

Ah I now remember reading/hearing people talk about this with llvm, where doing store is not advised and it should be doing a llvm.memcpy call instead.

GoldenbergDaniel changed the title ~~Code that uses slice.to_type takes 2-3x longer to compile normally and doesn't compile with -o:release~~ Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:release causes compiler to hang. Mar 10, 2025

GoldenbergDaniel changed the title ~~Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:release causes compiler to hang.~~ Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:speed causes compiler to hang. Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:speed causes compiler to hang. #4924

Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:speed causes compiler to hang. #4924

GoldenbergDaniel commented Mar 10, 2025

JesseRMeyer commented Mar 10, 2025

laytan commented Mar 12, 2025

JesseRMeyer commented Mar 12, 2025

laytan commented Mar 12, 2025

laytan commented Mar 12, 2025 •

edited

Loading

Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:speed causes compiler to hang. #4924

Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:speed causes compiler to hang. #4924

Comments

GoldenbergDaniel commented Mar 10, 2025

Context

Expected Behavior

Current Behavior

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

JesseRMeyer commented Mar 10, 2025

laytan commented Mar 12, 2025

JesseRMeyer commented Mar 12, 2025

laytan commented Mar 12, 2025

laytan commented Mar 12, 2025 • edited Loading

laytan commented Mar 12, 2025 •

edited

Loading