Skip to content

Extend telegram bot #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pdpino opened this issue Feb 7, 2023 · 6 comments
Open

Extend telegram bot #26

pdpino opened this issue Feb 7, 2023 · 6 comments

Comments

@pdpino
Copy link

pdpino commented Feb 7, 2023

Awesome library!

Is there a way to extend the telegram bot to answer more commands?

I suppose I can edit the source here and recompile, but I wonder if there is a different way to do this.

@pja237
Copy link
Contributor

pja237 commented Feb 8, 2023

Hey, thanks for the support!
We did a brief brainstorm about something like this, to allow users to issue additional commands to slurm cluster via bot, e.g. issue a scancel to a job etc. But after brief thinking, we dropped the idea because it seemed we'd need to implement quite a lot to make it safe, like authentication/authorization. So we focused just on the basic functionality to deliver messages, which is a safe one-way communication (bot->user).

As for how to go about it, it would be same as you have mentioned, edit that part of the code, assign handlers for additional commands and fill out the functions. I don't know of another way.
What did you have in mind to add in terms of commands?

@pdpino
Copy link
Author

pdpino commented Feb 9, 2023

we dropped the idea because it seemed we'd need to implement quite a lot to make it safe, like authentication/authorization

Makes sense! It's safer if the user just logins to the cluster via the usual methods.

What did you have in mind?

Haven't defined this yet, but as ideas:

  1. query jobs state, e.g. check state of a job, check squeue, check old jobs
  2. configure the bot's verbosity (per user)
  3. Customize bot's message sent on "/start"
  4. cancel jobs (though this would be particularly unsafe, so can be left out)
  5. bot could send notifications regarding the cluster, e.g. "warning: you are reaching your storage quota" (could be out of scope for this project) (see next comment)

@pdpino
Copy link
Author

pdpino commented Feb 9, 2023

I want to use the telegram bot to send more notifications regarding the cluster, e.g. "warning: you are reaching your max storage quota". I'd say is a bit out of scope for this library to implement it directly. But, it would be ideal if we could reuse functionality, something like:

import telegramBot, username2chatId

chatId = username2chatId("some telegram-username")
# note: could be unsafe (the user would have to register their cluster-username via the telegram bot?)
# we might have to work directly with chatId

telegramBot.sendMessage(chatId, "some message")

Or even for any connector with a go package?

import goslmailer
goslmailer.send("some message", "some username", "some connector")

or from the cmd line

$ goslmailer-send "some message" "some username" --connector telegram|email|etc

Does this make sense? I'm just kind of brainstorming here 😁

@pja237
Copy link
Contributor

pja237 commented Feb 11, 2023

Hey, that's some interesting brainstorm you've dropped here.
Made me think hard about all that. It does make sense, although i'm having trouble seeing the final picture, it's a bit too far away, still too vague. So let's try with some questions to clarify the vision up a bit, lets call whatever it is a "product", and i'll just unload the thoughts...

I've picked 2 use cases that i found to be technically "different", then i try to envision how the end product would look from the user perspective, and which components would i need to have in place to execute that product.

  1. list users jobs (squeue) - easy one
  2. quota warnings - complicated one?

For those two i tried then to envision what needs to exist to execute them.
So, the user opens his phone, goes to the chat with the bot and lets say does:

  1. /jobs to get his jobs in the queue, now this is something that i see as initiated by the user, bot can then do a squeue, perhaps some parsing, prettifying and returns the list back, quite simple...

  2. the quota warnings, now, that's something not initiated by the user but by the state of the system itself.
    So, what would be needed then is:
    a) user must register for quota warnings (e.g. /quotanotify command to the bot (some users might not want this sent to them))
    b) bot needs to start monitoring this users quota or there is another monitoring component sitting behind the bot to which it registers the user to monitor his quotas and send back the bot a notification to alert the user
    [monitor]<-->[bot]<-->[user]

In both cases, what's def. needed is a map between telegram user uids and cluster uids (t-uid<-->c-uid),
Could be done manually, or with some tokens generated at the cluster with which users then authenticate themselves to the bot (here we're entering dangerous territory 😆 )
Also to keep things useful, monitor must be configurable/modular enough to be able check quotas on different FSs (e.g. beegfs-ctl --getquota, etc.)

Other questions:

  1. would this be a telegram thing only, or a more general framework that supports delivery to other messaging systems, like through the connectors for other apps that we have

For now, the scope of what you've described is huge and in all that i see goslmailer might play just a tiny part from all the other components that might be missing. Unless we brainstorm-trim this down to some fundamentals significantly.

@pdpino
Copy link
Author

pdpino commented Feb 22, 2023

Thanks for the reply! This looks interesting.

  • I agree with the two use cases. I'll call them UC-1 (query jobs) and UC-2 (notify quota) from now on.
  • I see two main flows to cover these use-cases: flow-notify and flow-query
  • I'd design this leveraging the connector idea to work with any messaging system. Though the initial implementation could be targeted to telegram

flow-notify

This flow covers UC-2 (notify quota warnings)

I see something like this: [monitor] --> [messager] --> [connector] --> [user].

[monitor]

  • Process/service in the cluster that decides when to send messages to the user
  • Currently, SLURM itself works as a [monitor]: calls MailProg on events START, END, etc
  • To implement UC-2 we'd need to implement a [monitor] (e.g. monitor-quota) that checks user quota and decides when to notify the user. For example:
    • A simple script run periodically with crontab
    • The script: (1) checks user quota, (2) decide which users should notify, (3) calls MailProg --user <chat-id> --message "warn: you're near the quota"
  • (for now) To avoid the /quotanotify command via the bot, the user registers via the cluster directly
    • e.g. run monitor-quota register --connector telegram --chat-id <my-chat-id>

[messager]

  • Process/service in the cluster handling messages from cluster to connector
  • Simply receives a message and forward it to a connector, e.g.: provide a command like [messager] --connector telegram|email|slack|etc --user chat-id|email-address|etc
  • Currently, goslmailer accomplishes this

[connector]

  • Service in the cluster handling messages from cluster to user
  • Handles the actual communication with the bot
  • Currently, tgslurmbot accomplishes this (also matrixslurmbot and discoslurmbot)
  • I'd think gobler can be a [connector] as well, that instead of connecting to a bot, it spools and forwards to another [connector] (like a decorator)

flow-query

This covers UC-1 (query current jobs from the telegram-bot)

The messaging here needs to be in both directions: [listener] <--> [connector] <--> [user].

Though notice the flow is always initiated by the user: [user] --> [connector] --> [listener] --> [connector] --> [user]

[listener]

  • Service in the cluster listening queries, somewhat similar to the [monitor] from before
  • Requires authentication ⚠️
    • For example, require receiving a secret token (unique per cluster-user, generated randomly once)
    • If the token is incorrect, do not run any command in the cluster and return an error message
  • To implement UC-1 we'd setup a script that runs squeue, parses the output, and returns in a serialized format
    • The [listener] can be called like: slurm-listener list-jobs
  • SLURM's Rest API might work to this end (?)

[connector]

  • Service in the cluster receiving user queries from the bot
  • Requires the user to be authenticated ⚠️
    • Stores a mapping chat-id --> token
  • Listen to a set of commands and call the [listener]
    • e.g. listen to /jobs, call slurm-listener list-jobs, return the result via the chat

Security

Authentication can get complicated and risky. Some thoughts:

Safety first

  • Implement only safe queries in the [listener], e.g.:
    • squeue: seems safe enough, query system state and returns
    • scancel: seems unsafe! (modifies system state). Do not implement at all (or implement at your own risk)
  • Limit queries per minute --> avoids flooding DOS attack

Authentication flow (initial idea)

To authenticate, the user must:

  1. generate the token once by running directly in the cluster: generate-token. This returns the token to stdout, e.g. 123456789, and stores it somewhere safe (e.g. /home/<username>/.mysecrettoken, no read/write access for other cluster-users)
  2. send this command to the bot once: /auth 123456789. The [connector] stores the mapping chat-id --> token somewhere safe in the cluster (no read/write access to cluster-users). After that, the user can send commands through the bot to run authenticated queries.

To run authenticated commands: the [listener] first validates the token by calling validate-token <cluster-uid> <token>, and only then runs the actual query

Putting all together

Both flows are different enough to be treated separately, however, the [connector] for both flows must be the same service (at least for a telegram bot).

A simplified scheme would go as this:

[monitor] --> [messager] --\
                            \
                             --> [connector] <--> [user]
                            /
             [listener] <--/

A more detailed scheme (blue are flow-notify, and orange are for flow-query):

tgslurmbot-diagram

The flow-notify is already covered by goslmailer 🎉. Some questions:

  1. Can we reuse the [messager] for other use-cases with the flow-notify?
    • Say I develop a [monitor] to check user quotas periodically (ideally, this would be configurable to support multiple FSs)
    • Can I call something like goslmailer <chat-id> <msg> --connector telegram? Can we extend goslmailer to support this?
    • Alternatively, can I run tgslurmbot <chat-id> <msg> directly? This would not support other connectors or the gobler, but would cover my use-case with telegram
  2. Can we extend the [connector] to support the flow-query?
    • Say I implement a [listener] for SLURM, an [authenticator], and a [mapping]
    • Can I provide my own code to add more bot commands? Or maybe import basic configuration?: e.g.
      import "configBot", "mapping"
      
      // this applies the current configuration from tgslurmbot
      b := configBot()
      
      // add my own handlers
      b.Handle("/jobs", func(c tele.Context) error {
         token := mapping.ChatId2Token(c.Chat().ID)
         response := run "slurm-listener list-jobs", pass token
      
         return c.Send(response)
      })
      
      b.Start()

@pja237
Copy link
Contributor

pja237 commented Feb 25, 2023

Hey, that's quite a thorough planning. Respect.
In theory, it's all doable, but i'd ask you for a day/two to reread and digest/think it all through before i reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants