Skip to content

关于部分异体字实际占用两个字符的情况 #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
taowater opened this issue Jul 26, 2023 · 2 comments
Closed

关于部分异体字实际占用两个字符的情况 #43

taowater opened this issue Jul 26, 2023 · 2 comments

Comments

@taowater
Copy link

在实际使用库转换一些古籍文本时,有不少的文字转换失败,实际调试发现,有些异体字如𨦟,其占用两个char作为一个完整意义上的可见字符,而库中源码将字符串转为字符串数组的方式可能会将这种关联断掉,导致转换失败。实际自己的魔改实践发现,java.lang.String#codePointCount方法可以得到一个字符串中所含有的完整【字符】数量,例图二,我想请问您是否有打算兼容这种情况。
image
image

@houbb
Copy link
Owner

houbb commented Jul 26, 2023 via email

@houbb houbb closed this as completed Apr 11, 2025
@houbb
Copy link
Owner

houbb commented Apr 11, 2025

v1.9.1 已兼容支持。

        String originText = "\uD862\uDD9F";
        Assert.assertEquals(true, ZhConverterUtil.isChinese(originText));

        String text = "\uD86A\uDC43还有\uD862\uDD9F";

        Assert.assertEquals("\uD86A\uDC43还有\uD862\uDD9F", ZhConverterUtil.toSimple(text));
        Assert.assertEquals("\uD86A\uDC43還有\uD862\uDD9F", ZhConverterUtil.toTraditional(text));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants