Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:speed causes compiler to hang. #4924

Open
GoldenbergDaniel opened this issue Mar 10, 2025 · 5 comments

Comments

@GoldenbergDaniel
Copy link

Context

Odin: dev-2025-03-nightly
OS: Ubuntu 24.10, Linux 6.11.0-19-generic
CPU: AMD Ryzen 7 6800H with Radeon Graphics
RAM: 14703 MiB
Backend: LLVM 18.1.6

Expected Behavior

Code should compile with reasonable compile times.

Current Behavior

Code takes 2-3x longer to compile normally and doesn't compile with -o:release.

Failure Information (for bugs)

The following code compiles normally when the line with slice.to_type is commented out. I am pretty sure this is related to the structure of the data being serialized.

package game

import "core:fmt"
import "core:os"
import "core:slice"

Game :: struct
{
  t:        f32,
  entities: [1024+1]Entity,
}

Entity :: struct
{
  idx:       u32,
  gen:       u32,
  flags:     bit_set[Entity_Flag],
  props:     bit_set[Entity_Prop],
  pos:       [2]f32,
  vel:       [2]f32,
  dim:       [2]f32,
  rot:       f32,
  input_dir: [2]f32,
  tint:      [4]f32,
  color:     [4]f32,
  sprite:    u32,
  z_index:   i16,
  z_layer:   enum u8
  {
    NIL,
    DECORATION,
    ENEMY,
    PLAYER,
    PROJECTILE,
  },
}

Entity_Flag :: enum u32
{
  ACTIVE,
  MARKED_FOR_DEATH,
  INTERPOLATE,
}

Entity_Prop :: enum u64
{
  WRAP_AT_WORLD_EDGES,
  LOOK_AT_TARGET,
}

main :: proc()
{
  gm: Game

  SAVE_PATH :: "bug_test"

  save_file, open_err := os.open(SAVE_PATH, os.O_RDWR)
  defer os.close(save_file)
  if open_err == nil
  {
    load_game_from_file(save_file, &gm)
  }
  else
  {
    fmt.eprintln("Error opening file for loading!", open_err)
  }
}

load_game_from_file :: proc(fd: os.Handle, gm: ^Game) -> bool
{
  saved_buf: [size_of(Game)*2]byte
  saved_len, _ := os.read(fd, saved_buf[:])
  gm_bytes := saved_buf[:saved_len]

  ok: bool
  gm^, ok = slice.to_type(gm_bytes, Game) // Try commenting this out
  if !ok
  {
    fmt.eprintln("Failed to get Game from bytes!")
    return false
  }

  fmt.println("Loaded game from disk.")

  return true
}

Steps to Reproduce

Please provide detailed steps for reproducing the issue.

  1. If intending to run the code, make a file called "bug_test"
  2. Build the code above with odin build . -o:speed -show-timings
  3. Comment out the line that calls slice.to_type or make Entity an empty struct.
  4. Build the code again using the same command.
  5. Repeat for other optimization levels if necessary.

Failure Logs

N/A

@GoldenbergDaniel GoldenbergDaniel changed the title Code that uses slice.to_type takes 2-3x longer to compile normally and doesn't compile with -o:release Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:release causes compiler to hang. Mar 10, 2025
@GoldenbergDaniel GoldenbergDaniel changed the title Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:release causes compiler to hang. Code that uses slice.to_type takes 2-3x longer to compile normally, and -o:speed causes compiler to hang. Mar 10, 2025
@JesseRMeyer
Copy link

Reproducible at godbolt: https://godbolt.org/z/s3c79c98r
In fact it takes so long goldbolt kills the compiler. Remove -o:size to see the resulting (and awful) code gen.

Seems to be related to the long standing issue around LLVM's treatment of large arrays.

@laytan
Copy link
Collaborator

laytan commented Mar 12, 2025

Very weird, we generate this on -o:none

  %27 = call i8 @"slice::to_type:proc(buf:[]u8,T:$game::Game)->(:game::Game,:bool)"(<{ i64, i64 }> %26, ptr %8, ptr %__.context_ptr)
  %28 = load %"game::Game", ptr %8, align 4
  store %"game::Game" %28, ptr %1, align 4

and when optimisations are enabled LLVM turns this into thousands of instructions like:

  %.fca.0.load = load float, ptr %2, align 4
  %.fca.1.0.0.gep = getelementptr inbounds i8, ptr %2, i64 4
  %.fca.1.0.0.load = load i32, ptr %.fca.1.0.0.gep, align 4
  %.fca.1.0.1.gep = getelementptr inbounds i8, ptr %2, i64 8
  %.fca.1.0.1.load = load i32, ptr %.fca.1.0.1.gep, align 4
  %.fca.1.0.2.gep = getelementptr inbounds i8, ptr %2, i64 12
  %.fca.1.0.2.load = load i8, ptr %.fca.1.0.2.gep, align 4
  %.fca.1.0.3.gep = getelementptr inbounds i8, ptr %2, i64 13
  %.fca.1.0.3.load = load i8, ptr %.fca.1.0.3.gep, align 1
  %.fca.1.0.4.0.gep = getelementptr inbounds i8, ptr %2, i64 14
  %.fca.1.0.4.0.load = load i8, ptr %.fca.1.0.4.0.gep, align 2
  %.fca.1.0.4.1.gep = getelementptr inbounds i8, ptr %2, i64 15
  %.fca.1.0.4.1.load = load i8, ptr %.fca.1.0.4.1.gep, align 1
  %.fca.1.0.5.0.gep = getelementptr inbounds i8, ptr %2, i64 16
  %.fca.1.0.5.0.load = load float, ptr %.fca.1.0.5.0.gep, align 4
  %.fca.1.0.5.1.gep = getelementptr inbounds i8, ptr %2, i64 20
  %.fca.1.0.5.1.load = load float, ptr %.fca.1.0.5.1.gep, align 4
  %.fca.1.0.6.0.gep = getelementptr inbounds i8, ptr %2, i64 24
  %.fca.1.0.6.0.load = load float, ptr %.fca.1.0.6.0.gep, align 4
  %.fca.1.0.6.1.gep = getelementptr inbounds i8, ptr %2, i64 28
  %.fca.1.0.6.1.load = load float, ptr %.fca.1.0.6.1.gep, align 4
  %.fca.1.0.7.0.gep = getelementptr inbounds i8, ptr %2, i64 32
  %.fca.1.0.7.0.load = load float, ptr %.fca.1.0.7.0.gep, align 4
  %.fca.1.0.7.1.gep = getelementptr inbounds i8, ptr %2, i64 36
  %.fca.1.0.7.1.load = load float, ptr %.fca.1.0.7.1.gep, align 4
  %.fca.1.0.8.gep = getelementptr inbounds i8, ptr %2, i64 40
  %.fca.1.0.8.load = load float, ptr %.fca.1.0.8.gep, align 4
  %.fca.1.0.9.0.gep = getelementptr inbounds i8, ptr %2, i64 44
  %.fca.1.0.9.0.load = load float, ptr %.fca.1.0.9.0.gep, align 4
  %.fca.1.0.9.1.gep = getelementptr inbounds i8, ptr %2, i64 48
  %.fca.1.0.9.1.load = load float, ptr %.fca.1.0.9.1.gep, align 4
  %.fca.1.0.10.0.gep = getelementptr inbounds i8, ptr %2, i64 52
  %.fca.1.0.10.0.load = load float, ptr %.fca.1.0.10.0.gep, align 4
  %.fca.1.0.10.1.gep = getelementptr inbounds i8, ptr %2, i64 56
  %.fca.1.0.10.1.load = load float, ptr %.fca.1.0.10.1.gep, align 4
  %.fca.1.0.10.2.gep = getelementptr inbounds i8, ptr %2, i64 60
  %.fca.1.0.10.2.load = load float, ptr %.fca.1.0.10.2.gep, align 4
  %.fca.1.0.10.3.gep = getelementptr inbounds i8, ptr %2, i64 64
  %.fca.1.0.10.3.load = load float, ptr %.fca.1.0.10.3.gep, align 4
  %.fca.1.0.11.0.gep = getelementptr inbounds i8, ptr %2, i64 68
  %.fca.1.0.11.0.load = load float, ptr %.fca.1.0.11.0.gep, align 4
  %.fca.1.0.11.1.gep = getelementptr inbounds i8, ptr %2, i64 72
  %.fca.1.0.11.1.load = load float, ptr %.fca.1.0.11.1.gep, align 4
  %.fca.1.0.11.2.gep = getelementptr inbounds i8, ptr %2, i64 76
  %.fca.1.0.11.2.load = load float, ptr %.fca.1.0.11.2.gep, align 4
  %.fca.1.0.11.3.gep = getelementptr inbounds i8, ptr %2, i64 80
  %.fca.1.0.11.3.load = load float, ptr %.fca.1.0.11.3.gep, align 4
  %.fca.1.0.12.gep = getelementptr inbounds i8, ptr %2, i64 84
  %.fca.1.0.12.load = load i32, ptr %.fca.1.0.12.gep, align 4
  %.fca.1.0.13.gep = getelementptr inbounds i8, ptr %2, i64 88
  %.fca.1.0.13.load = load i16, ptr %.fca.1.0.13.gep, align 4
  %.fca.1.0.14.gep = getelementptr inbounds i8, ptr %2, i64 90
  %.fca.1.0.14.load = load i8, ptr %.fca.1.0.14.gep, align 2
  %.fca.1.0.15.gep = getelementptr inbounds i8, ptr %2, i64 91
  %.fca.1.0.15.load = load i8, ptr %.fca.1.0.15.gep, align 1
  %.fca.1.1.0.gep = getelementptr inbounds i8, ptr %2, i64 92
  %.fca.1.1.0.load = load i32, ptr %.fca.1.1.0.gep, align 4
  %.fca.1.1.1.gep = getelementptr inbounds i8, ptr %2, i64 96
  %.fca.1.1.1.load = load i32, ptr %.fca.1.1.1.gep, align 4
  %.fca.1.1.2.gep = getelementptr inbounds i8, ptr %2, i64 100
  %.fca.1.1.2.load = load i8, ptr %.fca.1.1.2.gep, align 4
  %.fca.1.1.3.gep = getelementptr inbounds i8, ptr %2, i64 101
  %.fca.1.1.3.load = load i8, ptr %.fca.1.1.3.gep, align 1
  %.fca.1.1.4.0.gep = getelementptr inbounds i8, ptr %2, i64 102
  %.fca.1.1.4.0.load = load i8, ptr %.fca.1.1.4.0.gep, align 2
  %.fca.1.1.4.1.gep = getelementptr inbounds i8, ptr %2, i64 103
  %.fca.1.1.4.1.load = load i8, ptr %.fca.1.1.4.1.gep, align 1
  %.fca.1.1.5.0.gep = getelementptr inbounds i8, ptr %2, i64 104
  %.fca.1.1.5.0.load = load float, ptr %.fca.1.1.5.0.gep, align 4
  %.fca.1.1.5.1.gep = getelementptr inbounds i8, ptr %2, i64 108
  %.fca.1.1.5.1.load = load float, ptr %.fca.1.1.5.1.gep, align 4
  %.fca.1.1.6.0.gep = getelementptr inbounds i8, ptr %2, i64 112
  %.fca.1.1.6.0.load = load float, ptr %.fca.1.1.6.0.gep, align 4
  %.fca.1.1.6.1.gep = getelementptr inbounds i8, ptr %2, i64 116
  %.fca.1.1.6.1.load = load float, ptr %.fca.1.1.6.1.gep, align 4
  %.fca.1.1.7.0.gep = getelementptr inbounds i8, ptr %2, i64 120
  %.fca.1.1.7.0.load = load float, ptr %.fca.1.1.7.0.gep, align 4
  %.fca.1.1.7.1.gep = getelementptr inbounds i8, ptr %2, i64 124
  %.fca.1.1.7.1.load = load float, ptr %.fca.1.1.7.1.gep, align 4
  %.fca.1.1.8.gep = getelementptr inbounds i8, ptr %2, i64 128
  %.fca.1.1.8.load = load float, ptr %.fca.1.1.8.gep, align 4
  %.fca.1.1.9.0.gep = getelementptr inbounds i8, ptr %2, i64 132
  %.fca.1.1.9.0.load = load float, ptr %.fca.1.1.9.0.gep, align 4
  %.fca.1.1.9.1.gep = getelementptr inbounds i8, ptr %2, i64 136
  %.fca.1.1.9.1.load = load float, ptr %.fca.1.1.9.1.gep, align 4
  %.fca.1.1.10.0.gep = getelementptr inbounds i8, ptr %2, i64 140
  %.fca.1.1.10.0.load = load float, ptr %.fca.1.1.10.0.gep, align 4
  %.fca.1.1.10.1.gep = getelementptr inbounds i8, ptr %2, i64 144
  %.fca.1.1.10.1.load = load float, ptr %.fca.1.1.10.1.gep, align 4
  %.fca.1.1.10.2.gep = getelementptr inbounds i8, ptr %2, i64 148
  %.fca.1.1.10.2.load = load float, ptr %.fca.1.1.10.2.gep, align 4
  %.fca.1.1.10.3.gep = getelementptr inbounds i8, ptr %2, i64 152
  %.fca.1.1.10.3.load = load float, ptr %.fca.1.1.10.3.gep, align 4
  %.fca.1.1.11.0.gep = getelementptr inbounds i8, ptr %2, i64 156
  %.fca.1.1.11.0.load = load float, ptr %.fca.1.1.11.0.gep, align 4
  %.fca.1.1.11.1.gep = getelementptr inbounds i8, ptr %2, i64 160
  %.fca.1.1.11.1.load = load float, ptr %.fca.1.1.11.1.gep, align 4
  %.fca.1.1.11.2.gep = getelementptr inbounds i8, ptr %2, i64 164
  %.fca.1.1.11.2.load = load float, ptr %.fca.1.1.11.2.gep, align 4
  %.fca.1.1.11.3.gep = getelementptr inbounds i8, ptr %2, i64 168
  %.fca.1.1.11.3.load = load float, ptr %.fca.1.1.11.3.gep, align 4

@JesseRMeyer
Copy link

Even with -o:none on godbolt the resulting assembly is a long series of various MOVs.

@laytan
Copy link
Collaborator

laytan commented Mar 12, 2025

Hmm maybe store %"game::Game" %28, ptr %1, align 4 would always need to select instructions to move each field individually then, even though it's one line of IR. Then LLVM tries to optimize those into better instructions but that doesn't seem the problematic thing then. The problematic thing is it seemingly doing a mov for each field.

@laytan
Copy link
Collaborator

laytan commented Mar 12, 2025

Ah I now remember reading/hearing people talk about this with llvm, where doing store is not advised and it should be doing a llvm.memcpy call instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants