-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heuristic to reduce the number of reported accesses #1
Comments
After a few tests and a long reasoning, it seems that this heuristic leads to too many false negatives. Moreover, it is not unusual that compilers optimize code and transform an array iteration loop into a smaller loop using SIMD extension instructions, in order to process, for instance, 4 integer elements within a single loop body execution. Another reason why it's better to remove this heuristic is that there are too many different situations in which a string may be found. For instance, on the stack there may be many contiguous small strings. In this case many strings will be found in the same 32 bytes during a strcpy, but the function signature has the string address as a first argument, and is therefore implemented in such a way that only the portion relative to the correct string is copied. Since we are working with (possibly stripped) binaries, there's no trivial way to distinguish a string (i.e. an array of chars) from an array of any other numeric type (int, long, float, double...), so it is no easy task to find an heuristic good enough to remove some of the not relevant uninitialized reads due to execution of string operations without increasing false negatives so much. Some naive alternatives may be:
For now, I'll rollback to last commit to remove the heuristic. |
Heuristic removed by commit 8e6c2aa |
Commit 3a31a3a implements a heuristic useful to reduce the number of reported uninitialized memory accesses.
This has been thought to try and recognize uninitialized accesses due to the usage of optimized versions of string operations.
The main idea is to avoid reporting an uninitialized memory access when these conditions are simultaneously true:
Condition 1 means that the string pointer has an alignment different from that required by the SIMD extension instruction used and the initialized portion ends before the end of the access. In this case, the layout should be something like this: UNINITIALIZED - INITIALIZED - UNINITIALIZED. This usually happens when we are doing something on some short string.
Condition 2 means that again the string pointer has an alignment different from the one required by the instruction. However, this time there's only 1 uninitialized interval, meaning that probably we have again a short string, which has the terminator in its last byte. However, in memory, there's something adjacent to the string, which has been initialized.
Condition 3 means that we are probably managing the end of a long string, whose size is not a multiple of the access size. In this case, the memory area is usually initialized from index 0 up to the null byte (included), and after that there is at least 1 byte not initialized.
Note that indexes are meant to be relative to the access boundaries.
In practice, we are assuming that everytime there is an uninitialized read where some bytes are initialized, while some other are not, we are handling a string (as it is composed by a sequence of bytes).
These instructions, however, may be used by the compiler to optimize operations on arrays of numeric data. However, in that case, the number is usually either fully initialized or not initialized, thus not falling in any of our conditions. It may be the case, however, that the developer managed the numeric data byte by byte (probably performing some casts). In that case, some false negatives are expected.
Another source of false negatives may be the usage of memory management functions (e.g. memcpy).
In glibc-3.31, memcpy is implemented by using 8 byte integer moves, so it can't generate false negatives due to our heuristic. However, it merely depends on the implementation of the function, and therefore it is expected to generate some false negatives with some implementations which may use the SIMD extension instructions as well.
The text was updated successfully, but these errors were encountered: