Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature for Duplicate Key PIR #281

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

yunbo99
Copy link

@yunbo99 yunbo99 commented Mar 21, 2025

No description provided.

Copy link

github-actions bot commented Mar 21, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @yunbo99, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces a new feature for Duplicate Key PIR (DkPir), which enables batch queries while preserving data privacy. The changes include adding necessary dependencies, example data, configuration files, and core logic for both sender and receiver sides. The implementation extends the existing APSI framework to support scenarios where the server-side database contains multiple rows with the same key, enabling applications like billable PIR.

Highlights

  • Dependency Addition: The pull request adds heu as a bazel dependency in MODULE.bazel. This library is used for homomorphic encryption, a key component of the DkPir protocol.
  • Example Data: New CSV files (duplicate_key_db.csv, duplicate_key_query.csv, duplicate_key_target_result.csv) are added to examples/pir/apsi/data/ to demonstrate the duplicate key PIR functionality. These files contain sample data and queries for testing the new feature.
  • Configuration Files: New configuration files (dk_pir_receiver_online.json, dk_pir_sender_offline.json, dk_pir_sender_online.json) are added to examples/pir/config/ to define the settings for the DkPir sender and receiver in both online and offline modes. These configurations specify parameters like file paths, curve types, and communication settings.
  • Core Logic Implementation: The pull request introduces a new directory psi/algorithm/dkpir/ containing the core logic for the DkPir protocol. This includes implementations for the sender, receiver, OPRF receiver, and common utilities, as well as a new proto definition for secret keys.
  • CSV Conversion Utilities: The pull request adds a csv_converter library to psi/utils, providing utilities for converting CSV files into formats suitable for APSI and DkPir. This includes functionalities for merging columns and rows, extracting queries, and extracting results.

Changelog

Click here to see the changelog
  • MODULE.bazel
    • Added heu as a bazel dependency.
  • examples/pir/apsi/data/duplicate_key_db.csv
    • Added a CSV file containing example database with duplicate keys.
  • examples/pir/apsi/data/duplicate_key_query.csv
    • Added a CSV file containing example queries for duplicate keys.
  • examples/pir/apsi/data/duplicate_key_target_result.csv
    • Added a CSV file containing the expected results for the example queries.
  • examples/pir/config/dk_pir_receiver_online.json
    • Added a configuration file for the DkPir receiver in online mode.
  • examples/pir/config/dk_pir_sender_offline.json
    • Added a configuration file for the DkPir sender in offline mode.
  • examples/pir/config/dk_pir_sender_online.json
    • Added a configuration file for the DkPir sender in online mode.
  • psi/algorithm/dkpir/BUILD.bazel
    • Added a BUILD file for the dkpir algorithm, defining the necessary libraries and dependencies.
  • psi/algorithm/dkpir/README.md
    • Added a README file describing the DkPir scheme, its application scenarios, and instructions for testing.
  • psi/algorithm/dkpir/common.cc
    • Implemented common utility functions for DkPir, including polynomial computation, key saving/loading, and result writing.
  • psi/algorithm/dkpir/common.h
    • Defined common data structures and function prototypes for DkPir.
  • psi/algorithm/dkpir/encryptor.h
    • Defined a custom Elgamal encryptor without plaintext range restrictions.
  • psi/algorithm/dkpir/entry.cc
    • Implemented entry points for the DkPir sender and receiver in both offline and online modes.
  • psi/algorithm/dkpir/entry.h
    • Defined the entry point functions for the DkPir algorithm.
  • psi/algorithm/dkpir/oprf_receiver.cc
    • Implemented the OPRF receiver component for DkPir.
  • psi/algorithm/dkpir/oprf_receiver.h
    • Defined the OPRF receiver class for DkPir.
  • psi/algorithm/dkpir/query.h
    • Defined the DkPirQuery class, extending the base apsi::sender::Query to include a sender count database.
  • psi/algorithm/dkpir/receiver.cc
    • Implemented the DkPir receiver logic, including OPRF request handling and query processing.
  • psi/algorithm/dkpir/receiver.h
    • Defined the DkPir receiver class.
  • psi/algorithm/dkpir/secret_key.proto
    • Defined the protobuf message for storing secret keys.
  • psi/algorithm/dkpir/sender.cc
    • Implemented the DkPir sender logic, including OPRF response generation and query processing.
  • psi/algorithm/dkpir/sender.h
    • Defined the DkPir sender class.
  • psi/algorithm/dkpir/sender_cnt_db.cc
    • Implemented the logic for generating the sender count database, including encryption of row counts.
  • psi/algorithm/dkpir/sender_cnt_db.h
    • Defined the functions for generating the sender count database.
  • psi/algorithm/dkpir/sender_dispatcher.cc
    • Implemented the DkPir sender dispatcher, handling OPRF requests and queries.
  • psi/algorithm/dkpir/sender_dispatcher.h
    • Defined the DkPir sender dispatcher class.
  • psi/apps/psi_launcher/BUILD.bazel
    • Added //psi/algorithm/dkpir:entry as a dependency to psi_launcher.
  • psi/apps/psi_launcher/launch.cc
    • Added RunDkPir functions to handle DkPir sender and receiver configurations.
  • psi/apps/psi_launcher/launch.h
    • Added function declarations for RunDkPir.
  • psi/proto/entry.proto
    • Added DkPirSenderConfig and DkPirReceiverConfig to the LaunchConfig message.
  • psi/proto/pir.proto
    • Added DkPirSenderConfig and DkPirReceiverConfig messages, and imported psi/proto/psi.proto.
  • psi/utils/BUILD.bazel
    • Added csv_converter library and its test.
  • psi/utils/csv_converter.cc
    • Implemented the ApsiCsvConverter class for converting CSV files.
  • psi/utils/csv_converter.h
    • Defined the ApsiCsvConverter class.
  • psi/utils/csv_converter_test.cc
    • Implemented tests for the ApsiCsvConverter class.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Trivia time!

What does the acronym PIR stand for in the context of Private Information Retrieval?

Click here for the answer
PIR stands for Private Information Retrieval.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new feature for Duplicate Key PIR (DkPir), including necessary code and configuration files. The changes seem well-structured, but there are a few points that need attention to ensure correctness and maintainability.

Summary of Findings

  • Missing License Information: The MODULE.bazel file lacks license information for the newly added dependency 'heu'. This should be added to comply with licensing requirements.
  • Potential Data Leakage: The current implementation of the DkPir scheme relies on the assumption that all client queries are present in the server's database. If this assumption is not met, the row count verification may fail, potentially leaking information about the client's query.
  • Lack of Input Validation: The code does not appear to have sufficient input validation, especially for file paths and configurations. This could lead to unexpected behavior or security vulnerabilities.

Merge Readiness

The pull request introduces a significant new feature, but there are several issues that need to be addressed before merging. Specifically, the missing license information, potential data leakage, and lack of input validation should be resolved. I am unable to approve this pull request, and recommend that it not be merged until these issues are addressed.

@yunbo99
Copy link
Author

yunbo99 commented Mar 21, 2025

I have read the CLA Document and I hereby sign the CLA

@qxzhou1010 qxzhou1010 self-requested a review March 28, 2025 02:53
Copy link
Contributor

@qxzhou1010 qxzhou1010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照之前的Review结果均做了修改,算法Review这边无问题,代码本身方面我这边也暂未发现问题。请工程同学再DoubleCheck一下。

@qxzhou1010
Copy link
Contributor

LGTM

int SenderOnline(const DkPirSenderOptions &options,
const std::shared_ptr<yacl::link::Context> &lctx);

int ReceiverOnline(const DkPirReceiverOptions &options,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests and benchmarks for this new algorithm.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前有写过单测,但会出现超时,有可能与link有关,而使用命令行运行不会出现该问题。benchmark回头我会补充一下。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以把单测代码贴上来看看为啥会卡

@@ -244,4 +245,79 @@ PirResultReport RunPir(const ApsiSenderConfig& apsi_sender_config,
return PirResultReport();
}

PirResultReport RunDkPir(const DkPirReceiverConfig& dk_pir_receiver_config,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests for these new entries and launch functions.


namespace psi::dkpir {
class DkPirSender : public psi::apsi_wrapper::Sender {
public:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 感觉之前的实现里Sender更像是个命名空间,而不是个类,这个地方的集成比较奇怪
  2. 当前的sender和老的sender有大量的重复
  3. 我感觉从算法上来说其实可以做的更内聚一点,可以把APSI就当做一个传输通道,真正预处理还有cnt处理的逻辑可以抽出来而不是耦合在sender和receiver的逻辑里面,看着不是很清晰,理论上你可以抛过APSI,你的算法代码应该可以做一个流程上的单测的。可以参考当前SealPIR或者SPiralPIR下面的做法,把APSI就当个通信功能,这样你的算法实现就可以比较干净。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢您的意见,我再尝试调整一下

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我也看了一下实现,确实OPRF那个地方的耦合会比较紧,只是建议啊,看能不能把APSI抛开不涉及通信搞一个单测

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants