Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokio-util/codec: AnyDelimiterCodec can decode bytes but only encode str #7169

Open
karalabe opened this issue Feb 22, 2025 · 5 comments
Open
Labels
A-tokio-util Area: The tokio-util crate C-bug Category: This is a bug. M-codec Module: tokio-util/codec

Comments

@karalabe
Copy link

Version

├── tokio v1.43.0
│   └── tokio-macros v2.5.0 (proc-macro)
├── tokio-serial v5.4.5
│   └── tokio v1.43.0 (*)
├── tokio-util v0.7.13
│   └── tokio v1.43.0 (*)
    ├── tokio v1.43.0 (*)

Platform

Darwin Mac.localdomain 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan 2 20:24:23 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6020 arm64

Description

I'm consuming a binary data stream using tokio_util::codec::Framed configured with a tokio_util::codec::AnyDelimiterCodec. My delimited is the 0 byte (it is a binary protocol; yes, I do have an encoding that guarantees no 0 bytes appear within a message). Consuming messages works perfectly.

On the other hand, encoding new binary messages hits a problem, the codec implements AsRef<str> for the encoder, which makes it expect an utf-8 string, not an arbitrary binary.

Looking at the code, it just downcasts it to an arbitrary binary anyway, so there's no value in the utf-8 enforcement in the first place:

    fn encode(&mut self, chunk: T, buf: &mut BytesMut) -> Result<(), AnyDelimiterCodecError> {
        let chunk = chunk.as_ref();
        buf.reserve(chunk.len() + 1);
        buf.put(chunk.as_bytes());
        buf.put(self.sequence_writer.as_ref());

        Ok(())
    }

Thus, my question is whether it's possible to change this generic enforcement, or add one that operates on raw binary blobs too?

Currently I can work around this via std::str::from_utf8_unchecked inside an unsafe block, it just seems like an omission in the library that the codec operates 100% safely on binary streams but it has an unwarranted syntactic limitation of utf-8 (in one direction only).

@karalabe karalabe added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Feb 22, 2025
@maminrayej maminrayej added A-tokio-util Area: The tokio-util crate and removed A-tokio Area: The main tokio crate labels Feb 22, 2025
@maminrayej maminrayej changed the title toyio-util/codec: AnyDelimiterCodec can decode bytes but only encode str tokio-util/codec: AnyDelimiterCodec can decode bytes but only encode str Feb 22, 2025
@maminrayej maminrayej added the M-codec Module: tokio-util/codec label Feb 23, 2025
@dlzht
Copy link
Contributor

dlzht commented Feb 26, 2025

Maybe we can change

impl<T> Encoder<T> for AnyDelimiterCodec
where
    T: AsRef<str>,
{ ... }

to

impl<T> Encoder<T> for AnyDelimiterCodec
where
    T: AsRef<[u8]>,
{ ... }

@maminrayej, any advice? if it 's acceptable, I can submit PR

@Darksonn
Copy link
Contributor

Your suggested change would be a breaking change and can only be done with a major version bump of tokio-util.

@Darksonn
Copy link
Contributor

What types are there that are AsRef<str> but not AsRef<[u8]>?

@dlzht
Copy link
Contributor

dlzht commented Feb 26, 2025

You are right, thank you for your reply. And can not solve it except a breaking change, right? (Or other methods that I didn't think of, please give me any advice)

In addition, when reading the implement, I find another "defect":

fn encode(&mut self, chunk: T, buf: &mut BytesMut) -> Result<(), AnyDelimiterCodecError> {
        let chunk = chunk.as_ref();
        // 
        buf.reserve(chunk.len() + 1);
        buf.put(chunk.as_bytes());
        buf.put(self.sequence_writer.as_ref());

        Ok(())
}

param of buf.reserve() is chunk.len() + 1, here 1 seems the assumed value of self.sequence_writer.len() , and can be changed to chunk.len() + self.sequence_writer.len()

@maminrayej
Copy link
Member

What types are there that are AsRef<str> but not AsRef<[u8]>?

I think Arc<str> and Box<str> are examples of types that are AsRef<str>, but not AsRef<[u8]>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio-util Area: The tokio-util crate C-bug Category: This is a bug. M-codec Module: tokio-util/codec
Projects
None yet
Development

No branches or pull requests

4 participants