-
Notifications
You must be signed in to change notification settings - Fork 33
Incorrect ${var%%"$suffix"} Behavior with Repeated Character Suffixes #833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. Bug also confirmed in the AT&T release versions, so at least we didn't introduce it. |
It looks like the bug is in the incredibly inscrutable regex code in Another manifestation of it: [[ aaaa =~ aaa$ ]]; echo $? Output: 1 (false/failure). Expected: 0 (true/success). Of course, that ERE should match. Experimenting:
This is really bad. All AT&T versions act the same, as well, so it looks like no one discovered this bug until now. |
just fyi, a construct like [[ aaaaaaaa =~ aa[a]aaa$ ]]; echo $? works as seemingly do all other possible variants where (at least!) one of the repeated instances of "a" in the pattern is converted to a bracket expression containing that single character "a". do not know whether it helps to understand the underlying issue but at least it seems this could serve to work around the problem since the patterns w/ and w/o the the same work around seems to be valid for the original example using standard or extended glob patterns:
and
also works just fine. |
Ping @JohnoKing @phidebian I could really use some eyes on this bug, if/when you have the time. It's a defect in basic glob/regex functionality, and I am having trouble figuring out where even to start looking. The bug is somewhere in src/lib/libast/regex. |
@McDutchie I submitted a PR that address only (I think didn't double check) the trivial case, i.e The original code looks broken with a bunch of calls to strngrpmatch() strgrpmatch() and varsub() that make no sens to me... I'm too scared to address the whole case you discovered, only fixed the original request. |
@sizur @McDutchie I withdrawn my ptevious PR and created a new one. I was focusing on @sizur test case and overlooked @McDutchie additional test cases Thoses test leds me to regcomp.c and the obscure Note that the next test in line I added tests to catch the bogus part. So far so good, but I trust your eagle eyes to catch other test cases. |
PR bug-833-3 submitted, cleaned leftover, cscope files, and 1 cut/paste line in tests/bracket.sh Hope this one is ok. |
Unfortunately not. The variant above works, but prepend one
|
Note: further discussion at #844. |
Yevgeniy Grigoryev (@sizur) reports: > > str=aaaa && echo "${str%%aaa}" > > Expected: > > a > > Observed: > > aaaa Other manifestations of it (these should all match): [[ aaaa =~ aaa$ ]] # returns 1 (no match) [[ aaaaaaaaab =~ aaaaa[a]aab$ ]] # returns 1 (no match) [[ aaaaaaaab =~ aaaa[a]aab$ ]] # returns 1 (no match) [[ aaaaaaab =~ aaa[a]aab$ ]] # returns 1 (no match) ...but if we increase the string/regex length difference by one: [[ aaaaaab =~ aa[a]aab$ ]] # returns 0 (match) @phidebian helpe me out and tracked down a major clue (thanks!): > @sizur test case and @McDutchie additional test cases [...] leds > me to regcomp.c and the obscure > if ((x = env->stats.x) && x->re.string.size < 3) > I see no reasons for this <3 test beside breaking regcomp, so I > removed it, and it solved all the cases, I would say magically > solved. This code is in the special() function in regcomp.c (2999-3225). Going by the comment there, it looks like the special function is an optimisation for certain special cases. It sets two types for compiled regular expression nodes: REX_KMP, which enables string searching using the Knuth-Morris-Pratt algorithm[*1], and REX_BM, which enables string searching using the Boyer-Moore algorithm[*2]. @phidebian's proposed change amounts to x always being 0, which simply disables the optimisation in this case. Which made me think, what would happen if we deleted the whole special() function and the two calls to it? So I tried that, and I threw all the tests at it that I have, with none failing, and this bug is fixed. Also, when playing with various benchmarks, I detect no significant performance regression, and in fact a slight performance increase for common glob patterns. Removing special() renders all the REX_KMP and REG_BM code effectively dead, because it is special() that creates nodes of these types. Most markedly, the entire regrexec() function becomes dead code, because it is only usable for regex trees of type REG_BM and simply returns REG_BADPAT (which is 2) for any other tree type. (This means it is never used without a generic fallback.) So, there is a lot more code that could be removed. I conclude that it is not worth spending time on fixing this optimisation bug. I think the best way to fix this bug is to remove the KMP and BM optimisations, which according to my tests have a dubious performance benefit anyway. src/cmd/ksh93/sh/io.c: - Fix minor unrelated issue: the return type of pat_line() should be size_t, not int. - Remove regrexec() call from io_patseek(). Simplify the code accordingly (act like regrexec returned REG_BADPAT == 2). - Remove pat_seek() which was a handler for the regrexec() call. src/lib/libast/regex/{regcomp,regdecomp,regnexec,regstat}.c: - Remove special() and related code. - Remove code handling REX_BM and REX_KMP node types. src/lib/libast/regex/regrecord.c, src/lib/libast/regex/regrexec.c: - Removed. These are dead code after removing special(). src/lib/libast/regex/reglib.h: - Remove REX_BM and REG_KMP type macros. - Remove now-unused Bm_t type and 'bm' field in Rex_t. src/lib/libast/include/regex.h: - Remove now-unused fields, macros and function declarations. - Define backward compat macros for removed functions: regrecord returns 0 (regrexec can't be used) and regrexec returns REG_BADPAT (also indicating it can't be used). src/lib/libast/features/api: - Bump API version due to removal of regrexec() and regrecord(). src/cmd/ksh93/tests/{bracket,substring}.sh: - Add regression tests by @phidebian. Resolves; #833 Resolves: #844 [*1] https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm [*2] https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm
Yevgeniy Grigoryev (@sizur) reports: > > str=aaaa && echo "${str%%aaa}" > > Expected: > > a > > Observed: > > aaaa Other manifestations of it (these should all match): [[ aaaa =~ aaa$ ]] # returns 1 (no match) [[ aaaaaaaaab =~ aaaaa[a]aab$ ]] # returns 1 (no match) [[ aaaaaaaab =~ aaaa[a]aab$ ]] # returns 1 (no match) [[ aaaaaaab =~ aaa[a]aab$ ]] # returns 1 (no match) ...but if we increase the string/regex length difference by one: [[ aaaaaab =~ aa[a]aab$ ]] # returns 0 (match) @phidebian helpe me out and tracked down a major clue (thanks!): > @sizur test case and @McDutchie additional test cases [...] leds > me to regcomp.c and the obscure > if ((x = env->stats.x) && x->re.string.size < 3) > I see no reasons for this <3 test beside breaking regcomp, so I > removed it, and it solved all the cases, I would say magically > solved. This code is in the special() function in regcomp.c (2999-3225). Going by the comment there, it looks like the special function is an optimisation for certain special cases. It sets two types for compiled regular expression nodes: REX_KMP, which enables string searching using the Knuth-Morris-Pratt algorithm[*1], and REX_BM, which enables string searching using the Boyer-Moore algorithm[*2]. @phidebian's proposed change amounts to x always being 0, which simply disables the optimisation in this case. Which made me think, what would happen if we deleted the whole special() function and the two calls to it? So I tried that, and I threw all the tests at it that I have, with none failing, and this bug is fixed. Also, when playing with various benchmarks, I detect no significant performance regression, and in fact a slight performance increase for common glob patterns. Removing special() renders all the REX_KMP and REG_BM code effectively dead, because it is special() that creates nodes of these types. Most markedly, the entire regrexec() function becomes dead code, because it is only usable for regex trees of type REG_BM and simply returns REG_BADPAT (which is 2) for any other tree type. (This means it is never used without a generic fallback.) So, there is a lot more code that could be removed. I conclude that it is not worth spending time on fixing this optimisation bug. I think the best way to fix this bug is to remove the KMP and BM optimisations, which according to my tests have a dubious performance benefit anyway. src/cmd/ksh93/sh/io.c: - Fix minor unrelated issue: the return type of pat_line() should be size_t, not int. - Remove regrexec() call from io_patseek(). Simplify the code accordingly (act like regrexec returned REG_BADPAT == 2). - Remove pat_seek() which was a handler for the regrexec() call. src/lib/libast/regex/{regcomp,regdecomp,regnexec,regstat}.c: - Remove special() and related code. - Remove code handling REX_BM and REX_KMP node types. src/lib/libast/regex/regrecord.c, src/lib/libast/regex/regrexec.c: - Removed. These are dead code after removing special(). src/lib/libast/regex/reglib.h: - Remove REX_BM and REG_KMP type macros. - Remove now-unused Bm_t type and 'bm' field in Rex_t. src/lib/libast/include/regex.h: - Remove now-unused fields, macros and function declarations. - Define backward compat macros for removed functions: regrecord returns 0 (regrexec can't be used) and regrexec returns REG_BADPAT (also indicating it can't be used). src/lib/libast/features/api: - Bump API version due to removal of regrexec() and regrecord(). src/cmd/ksh93/tests/{bracket,substring}.sh: - Add regression tests by @phidebian. Resolves; #833 Resolves: #844 [*1] https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm [*2] https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm
Fixed in b72d7eb (the bug was not automatically closed because of a typo in the commit message: |
Expected:
Observed:
The text was updated successfully, but these errors were encountered: