Skip to content

Consider using the debug_redact field / enum option when marshaling into prototext #1655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ivucica opened this issue Nov 11, 2024 · 15 comments

Comments

@ivucica
Copy link

ivucica commented Nov 11, 2024

Is your feature request related to a problem? Please describe.

Field debug_redact has been added around Protobuf 22 to mark sensitive fields as being sensitive. (Amusingly, this means protoc that ships in Debian, as late as unstable, does not support this field yet.)

Not using debug_redact means that fields that are tagged as sensitive will be redacted in some languages such as C++, but not in Go.

Please note that debug_redact can be used on more than just fields; but, its purpose on fields is clearer than on enums.

Describe the solution you'd like

A solution should be found so that serialization for purposes of text output results in fields being redacted, but otherwise not.

In C++, TextFormat::Printer::PrintFieldValue invokes TryRedactFieldValue which uses this option if the serializer has been initialized with redact_debug_string_ option:

https://github.com/protocolbuffers/protobuf/blob/5d58caeebc5c2779ab4db87285280784159f9ed4/src/google/protobuf/text_format.cc#L3026-L3055

Java computes whether a field is sensitive based on this option: https://github.com/protocolbuffers/protobuf/blob/5d58caeebc5c2779ab4db87285280784159f9ed4/java/core/src/main/java/com/google/protobuf/Descriptors.java#L1837-L1859

This is then used if enablingSafeDebugFormat is flipped true:

https://github.com/protocolbuffers/protobuf/blob/5d58caeebc5c2779ab4db87285280784159f9ed4/java/core/src/main/java/com/google/protobuf/TextFormat.java#L628-L634

For Go, it is not completely clear to me when this should be used. Adding it to the right place seems trivial: it likely belongs in marshalField or wherever marshalSingular is called, and an option can be used at that point: https://github.com/protocolbuffers/protobuf-go/blob/b98563540c0a4edb38526bcd6e6c97f9fac1f453/encoding/prototext/encode.go#L210

However, once the encoder supports this, should the Stringer interface (String()) on a generated message proceed to output the redacted or the non-redacted version of the message?

  • I'm leaning towards redacted by default. It would have to be a combination of someone using str(someMessage) to generate a textproto and also use the relatively new field debug_redact in order to experience this change in behavior. If I recall correctly, the textpb output is not defined to be stable -- so perhaps MessageStringOf(m protoreflect.ProtoMessage) string is permitted to change behavior when invoking prototext.MarshalOptions{Multiline: false}.Format(m).
  • The counterpoint to this is that dropping fields in situations when people may be taking shortcuts to writing data could cause data loss. Perhaps a transition period where encoder actually takes two bool options: RedactFieldsWarns (toggle warning debugoutput for a field, possibly a comment in generated textpb; for default stringification, defaults to true) and RedactFields (for default stringification, initially defaults to true, later default false); possibly later switching to panics.
  • Another way to approach this: deprecate use of String() by adding a relevant docstring to the generated code. This feels rather blunt, but if debug_redact is important enough, it would hint to the users that they should avoid it. It would probably not affect the users widely enough, however: would casting with str() or using %s + %v show up in code health tools to indicate invoking a deprecated Stringer?

It feels like sensitive fields should be saved only when explicitly requested, but unfortunately, existing code before the field existed or before it was used may have different expectations.

Describe alternatives you've considered

Two main approaches:

  1. Write a custom serializer that redacts the fields. Never use str(), .String() or %s / %v on a generated proto message without using the custom formatter+redactor, just as one would have to do with a custom option. (This would then apply whether logging or not, as there can be no guarantees of what will happen with the proto is fmt.Sprintf'd.) Process generated .pb.go code to add // Deprecated: Using String() directly does not redact fields. to the docstring so the tooling starts to output warnings about use of String().
  2. Choose to consider debug_redact broken and ignore the existence of this field option, accepting that potentially sensitive messages will end up in logs or other places where they should not.
    • Possibly: send a pull request to base protobuf repo to add a comment to debug_redact option to clarify that implementing is optional, implementation specific and that depending on it taking effect without examining the behavior in every used language is risky.

As is, the option is present and usable, but will not actually result in redaction of fields in sensitive contexts.

Additional context

It may be worth considering otherwise allowing modification of the marshaling of the fields, in case other options / annotations affect the field in other, non-logging contexts.

As a example, customization might be useful because a field might not be sensitive for logging, but it might be too sensitive to display to some types of system administrators, or to send to end users.

This consideration on allowing customizing serialization would likely apply across languages.

@puellanivis
Copy link
Collaborator

At first, I suspected this were a duplicate of #1541 however, having read through the issue, it’s different, because this debug_redact behavior exists in C++ and Java at least. So the most common concern of “yes, but is it in the other big 4?” is already met.

I would say, there is no condition where we would ever want to deprecate String(). The fmt.Stringer interface is one of the core and ubiquitous concepts of the language. And removing String() won’t stop the protobuf messages from being stringified with these redacted fields unredacted, for example, if done through fmt.Sprint() or %v formatting verb. https://go.dev/play/p/huhWv1vQ3cH

Next, we would not want to add any options that default to true. All options should be worded so that their default is naturally false. This is another practical Go convention. prototext.MarshalOptions{} has to correspond with all default values. So adding a field that defaults to true requires seeing the field set to the zero value (false) and setting it to true. So, the option would not be possible to disable.

Now that I’ve covered all the “bad news”, here’s the good news from my side: we’ve held for quite a while that prototext is only intended to be a human-readable debugging format. People who do not want the fields redacted can always elect to simply not add the tag. And, if we implement it like C++, then it affects only prototext output anyways. Individuals remain on their own to implement any sort of redactor for JSON formatting.

Pushing it as a universal default might be a bit much (though from my view, not entirely out of the question). But at worst, we could add something like a GOLANG_PROTOBUF_DEBUG_REDACT=true env var and/or an -ldflags "-X google.golang.org/protobuf/encoding/prototext.debug_redact=true" compile time option to turn it on/off by default. Projects and companies could then elect to opt-in the same as C++. And build it into scripts anywhere they want it enforced as policy.

As for how to emit a message without redaction in a new world where redaction would be a possible default, that could be as simple as adding a new EmitRedactedFields field to prototext.Marshal, then anyone who wants an unredacted version regardless of defaults can simply use the prototext.MarshalOptions.Format method to print an unredacted version.

TL;DR: C++ and Java already support this, any change of behavior is in what we already consider human-oriented debug messages, we could provide reasonable knobs to set desired default behavior at runtime or compile time, and the redaction is already field-specific opt-in already.

@neild
Copy link
Contributor

neild commented Nov 12, 2024

This seems reasonable. Actually, I'm surprised debug_redact was added without Go support--it seems like an oversight.

I'd say that:

  • prototext.MarshalOptions should have a field controlling whether redaction is applied.
  • The String method of generated messages should redact.
  • prototext.Marshal should not redact by default.

I'm not sure about prototext.Format. I lean towards saying that it should redact.

@puellanivis
Copy link
Collaborator

I like those defaults and knobs.

@stapelberg
Copy link

@ivucica Would you be able to send a Gerrit change? See https://github.com/protocolbuffers/protobuf-go/blob/master/CONTRIBUTING.md

@ivucica
Copy link
Author

ivucica commented Nov 12, 2024

To confirm, consensus is default to redacting, and no envvar? If I’m implementing this, happy to add the envvar.

Any particular documentation you’d like me to update?

I can’t offer a timeline for this contribution, as this is a personal interest, but (aside from any tests I should update) it seems simple enough, so I can try giving it a spin. If someone beats me to it, I won’t complain, of course.

@puellanivis
Copy link
Collaborator

I don’t think we need an ENV var, no. With the options as proposed by nield, we should have sane enough behavior overall to not need to offer a kill-switch ENV var.

@veqryn
Copy link

veqryn commented Mar 14, 2025

Can I ask that the same be applied to protojson please?

There should be a way in protojson for redacting/skipping fields that have debug_redact = true
protojson.MarshalOptions should have a field controlling whether redaction is applied, just like what is proposed above.

We log all received and sent proto messages as json, and are presently running into this exact issue of how to redact them.

@ivucica
Copy link
Author

ivucica commented Mar 20, 2025

I'll give a pass to textprotos. I don't know what the intended practice would be for protojson (and deciding it should not block prototext work). Likely it should be a separate feature request.

@puellanivis
Copy link
Collaborator

While protojson could include this feature, be aware that some people are using protojson as an interchange format. We definitely need to be careful about removing redacted fields much more with protojson than with prototext, which is inherently a debugging format.

🤔 Hm… but I don’t see any reason why it couldn’t go into protojson though.

@ivucica
Copy link
Author

ivucica commented Mar 20, 2025

I've started looking at patching https://github.com/protocolbuffers/protobuf-go.

Even putting it into the textproto package is proving to be a fun exercise.

  • I've added a new MarshalOptions field1 and updated Format to default to having MarshalOptions set to RedactDebugString: true. This seems to be the right way to go: it explicitly says that the output is not stable2.
  • I've updated the test proto definitions for proto2, proto3 and editions, and I've put together a quick test in encode_test.go. (Tedious, with just enough variety in field names and features to test that I can't use the test generator.)
    • However, it looks like referring to the proto descriptor's raw descriptorpb directly in encode.go is a no-go. Import loops.
  • Therefore, it looks like I need to extend interface{}s (!) for FieldDescriptor and EnumValueDescriptor inside protoreflect/type.go. I am tentatively calling the method IsSensitive()` since that seems to be what it's called in Java.
    • However, this means carefully extending code generator and anything else that might be trying to implement these interfaces.
    • At least one such place is the dynamicproto package -- there may be more such hidden places just in the codebase I'm looking at.

It does look like it'll be quite a bit of work to update various callsites -- I'm nowhere near close, even though this seemed like it might be a small change.

Once closer to being ready3, I may need to call upon experts for advice (or help) to update the codebase I'm looking at, and only then can I consider going via the Gerrit route.

We definitely need to be careful about removing redacted fields much more with protojson than with prototext

Yes, whoever does the patching should likely keep the default off there in all cases.

Footnotes

  1. https://github.com/protocolbuffers/protobuf-go/blob/a5da9b210e40c80903bd3aeed46f29a3506c8a3a/encoding/prototext/encode.go#L78

  2. https://github.com/protocolbuffers/protobuf-go/blob/a5da9b210e40c80903bd3aeed46f29a3506c8a3a/encoding/prototext/encode.go#L89

  3. Please do note that I'm looking at this in my spare time.

@puellanivis
Copy link
Collaborator

puellanivis commented Mar 21, 2025

Thanks for putting such time and effort into looking into this feature, a few thoughts to help you out:

I've added a new MarshalOptions field and updated Format to default to having MarshalOptions set to RedactDebugString: true. This seems to be the right way to go: it explicitly says that the output is not stable².

None of the defaults should be true. Default values for MarshalOptions should be the zero value for that field. That way, a field that is not set will then be set to the default value. (This is especially true for bool fields, since now true is true, and false would be converted into true. Now the bool is always true.)

We discussed above that the field for this in MarshalOptions should default to false. People should have to opt-in to this behavior when using MarshalOptions explicitly. It is only the String() method that should set this field to true, and that would be done by adding the field to the MarshalOptions in internal/impl.Export.MessageStringOf()

  • Therefore, it looks like I need to extend interface{}s (!) for FieldDescriptor and EnumValueDescriptor inside protoreflect/type.go. I am tentatively calling the method IsSensitive()` since that seems to be what it's called in Java.

Extending interfaces is notoriously not backwards-compatible. Instead, if one wants to expand an interface, it’s best to use a separate interface that embeds the original interface (like fs.ReadDirFile, fs.ReadDirFS, and flag.Getter).

  • However, this means carefully extending code generator and anything else that might be trying to implement these interfaces.

If this change requires a code generator update, that could also complicate the backwards compatibility… if regeneration of protobufs would be necessary to update the protobuf module version, then that will be very hard for users to work with, since they are often depending on protobufs outside of their control and ability to regenerate.

It does look like it'll be quite a bit of work to update various callsites -- I'm nowhere near close, even though this seemed like it might be a small change.

A heuristic that I often try and use is that if something is becoming overwhelmingly difficult or tedious, then maybe I’m attacking it from the wrong direction. You might be going after a few too many hydra heads with your approach so far, and if you’re having to change so much because backwards compatibility with the existing code breaks, then consider how much that would impact users as well.

We definitely need to be careful about removing redacted fields much more with protojson than with prototext

Yes, whoever does the patching should likely keep the default off there in all cases.

As I note above, the default should be off here as well. Only for the String() implementation should this ever default to on, since that’s patently debugging output. We need to be wary of the principle of least astonishment when changing any default behavior. We decided above that this is ok for String(), really only because it shouldn’t have ever shown up outside of for-human debugging text before.

@ivucica
Copy link
Author

ivucica commented Mar 21, 2025

I have a somewhat working version (with tests covering proto2 syntax for now, and failing on one case that I came up wiith).

It's also very, very messy. I'll have to review how accessing fd.Options() vs fd.L1 works, and probably separate updating the Go tag (if at all, though it might be a good idea).

I'll have to look for advice on how to clean it up.

@neild
Copy link
Contributor

neild commented Mar 21, 2025

Extending interfaces is notoriously not backwards-compatible. Instead, if one wants to expand an interface, it’s best to use a separate interface that embeds the original interface (like fs.ReadDirFile, fs.ReadDirFS, and flag.Getter).

That's usually the case, but we intentionally designed the protoreflect descriptor types to allow for extension. I haven't looked into this in any detail, but adding FieldDescriptor.DebugRedact or FieldDescriptor.IsSensitive seems reasonable to me. (Does Java really have a different name for this in its descriptors than the FieldOptions message field? That's unfortunate.)

@ivucica
Copy link
Author

ivucica commented Apr 7, 2025

Thanks for the followups -- this is just a quick note that this is on my mind, and I just haven't gotten around to putting some time into it.

@puellanivis

Thanks for putting such time and effort into looking into this feature, a few thoughts to help you out:

I've added a new MarshalOptions field and updated Format to default to having MarshalOptions set to RedactDebugString: true. This seems to be the right way to go: it explicitly says that the output is not stable².

None of the defaults should be true. Default values for MarshalOptions should be the zero value for that field. That way, a field that is not set will then be set to the default value. (This is especially true for bool fields, since now true is true, and false would be converted into true. Now the bool is always true.)

We discussed above that the field for this in MarshalOptions should default to false. People should have to opt-in to this behavior when using MarshalOptions explicitly. It is only the String() method that should set this field to true, and that would be done by adding the field to the MarshalOptions in internal/impl.Export.MessageStringOf()

To be clear: the MarshalOptions struct has it set to false. But String() does not seem the right place: I suspect it's prototext.Format in encode.go. (Approximate) current working copy for me1:

const defaultIndent = "  "
const fieldValueReplacement = "[REDACTED]"

// Format formats the message as a multiline string.
// This function is only intended for human consumption and ignores errors.
// Do not depend on the output being stable. Its output will change across
// different builds of your program, even when using the same version of the
// protobuf module.
func Format(m proto.Message) string {
	return MarshalOptions{Multiline: true, RedactDebugString: true}.Format(m)
}

Based on the clearly visible comment, this is not meant to be stable and we are allowed to change the behavior -- and it seems like it's documented to have unstable output (just like Marshal) and is, unlike Marshal, already trying to make the output palatable for people. I think it'd make sense to update it here.

I was considering whether updating Marshal() makes sense, but probably not: that defaults to zero-value MarshalOptions (hence leaving Multiline to false) so that's less likely to end up in logs and such.

These are probably not the only places to update: upon a closer look (since I hadn't actually reached the phase where I've tested what String() does), maybe the right place is this in internal/impl/api_export.go:

// MessageStringOf returns the message value as a string,
// which is the message serialized in the protobuf text format.
func (Export) MessageStringOf(m protoreflect.ProtoMessage) string {
	return prototext.MarshalOptions{Multiline: false}.Format(m)
}

Is this the right method that's called by a Message's implementation of the stringer interface? I don't know, but it seems likely.

All in all, as you said above, I do think RedactDebugString does need to default to off. No initialization of struct = no change. The question is just which places are "least astonishment" to get updated to the new behavior for anyone that does use protos that have the debug_redact option. (The status quo is astonishment in itself, for me, so I would certainly flip something to true.) Would this warrant new methods with the new behavior that we can't remove? Likely not. And adding those new methods also wouldn't fix existing callsites that erroneously log fields that are intended to be redacted.

I'd leave that exact decision to the reviewer and to consensus.

  • Therefore, it looks like I need to extend interface{}s (!) for FieldDescriptor and EnumValueDescriptor inside protoreflect/type.go. I am tentatively calling the method IsSensitive() since that seems to be what it's called in Java.

Extending interfaces is notoriously not backwards-compatible. Instead, if one wants to expand an interface, it’s best tonto Gerrit per suggestions above.

There's a chance it is not avoidable.

On the other hand, perhaps an interesting (but not nicest) fix might be to make sensitive fields implement an extra interface and thus make them identifiable by the prototext marshaling code by a type check, rather than by accessing the descriptor. (That is: no need to call the method at all.) Old generated code would not implement the new method, so fields would look non-sensitive; an extra method generated for the fields that have this option would not cause problems2.

I do also already have some notes in the working copy where shouldRedactEnum() and shouldRedactField(), private methods on the private prototext.encoder type, on avoiding extending the protoreflect.FieldDescriptor and .EnumValueDescriptor interfaces, but the generated IsSensitive() code still invokes implementations of IsSensitive() I added elsewhere -- I'll have to look at this very, very carefully.

I asked around for some advice regarding understanding of some of the descriptor code -- I really did go into this expecting I would make a minimal change to prototext, and just use the descriptor available at runtime to see if a particular field and enumvalue have the option set. I was a bit surprised at the distinctions with L0/L1/L2 in desc_init.go, desc.go etc; their existence was not something I expected to be necessary before seeing it.

Besides, I didn't check, but hopefully debug_redact does end up propagating via reflection/introspection -- I never really looked at that. But if it could be set correctly on fields in dynamicproto messages, it would be neat if prototext could correctly use that. Tentatively, I have some changes to dynamicproto in the prototype change, but I have not verified anything about how this behaves in practice.

@neild

(Does Java really have a different name for this in its descriptors than the FieldOptions message field? That's unfortunate.)

I made this note when filing the bug report originally:

Java computes whether a field is sensitive based on this option: https://github.com/protocolbuffers/protobuf/blob/5d58caeebc5c2779ab4db87285280784159f9ed4/java/core/src/main/java/com/google/protobuf/Descriptors.java#L1837-L1859

This is then used if enablingSafeDebugFormat is flipped true:

https://github.com/protocolbuffers/protobuf/blob/5d58caeebc5c2779ab4db87285280784159f9ed4/java/core/src/main/java/com/google/protobuf/TextFormat.java#L628-L634

I don't have the exact same logic to implement this, but the naming seems to make a bit of sense -- especially if it is at some point extended with other annotations that are not part of this change. The main downside is inconsistency; but the upside is being able to cover more than one option, should the need arise.

Footnotes

  1. n.b. possibly that's a misnomer for the constant, since the same value applies to enums values too. I'll revise that.

  2. Well, technically, if someone extended message types and named the empty method IsSensitive() or DebugRedactMarker() or whatever we end up calling it, sure, those users will have to work around that.

@puellanivis
Copy link
Collaborator

To be clear: the MarshalOptions struct has it set to false.

👍

Is this the right method that's called by a Message's implementation of the stringer interface? I don't know, but it seems likely.

Yes, it is the correct place for String implementations. I had also referenced it by name in my comment. Good call on adding it to Format as well.

Would this warrant new methods with the new behavior that we can't remove?

The only place we would need to change any defaults is when there’s zero possibility to modify the behavior as the caller. As you’ve found Format is one of these, as well as from our earlier discussions String (which you’ve correctly traced to impl.Export.MessageStringOf). We only want to change the default to true where the formatting is undoubtably debug-only. Either through name or description containing some variant of …only intended for human consumption…

There's a chance it is not avoidable.

As was pointed out, the API was apparently designed to be extendable without causing issues. 👍

an interesting (but not nicest) fix might be to make sensitive fields implement an extra interface

This would be even more impossible. I mean, a direct type of uint32 cannot implement any receiver methods.

In all, I’m positive about the progress you’re making, and not having any more reservations on your design. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants