Provided by: debgpt_0.4.94_all bug

NAME

       DebGPT - Chatting LLM with Debian-Specific Knowledge

              “AI” = “Artificial Idiot”

SYNOPSIS

       debgpt  [-h]  [--quit]  [--multiline]  [--hide_first]  [--verbose] [--output OUTPUT] [--version] [--debg‐
       pt_home DEBGPT_HOME]       [--frontend {dryrun,zmq,openai}] [--temperature TEMPERATURE]  [--top_p  TOP_P]
       [--openai_base_url OPENAI_BASE_URL]       [--openai_api_key OPENAI_API_KEY] [--openai_model OPENAI_MODEL]
       [--zmq_backend  ZMQ_BACKEND]  [--bts  BTS]  [--bts_raw] [--cmd CMD]       [--buildd BUILDD] [--file FILE]
       [--policy POLICY] [--devref DEVREF] [--tldr TLDR] [--ask ASK]       [SUBCOMMAND] ...

DESCRIPTION

       This tool is currently experimental.

       Large language models (LLMs) are newly emerged tools, which are capable of handling tasks that tradition‐
       al software could never achieve, such as writing code based on the specification provided  by  the  user.
       With this tool, we attempt to experiment and explore the possibility of leveraging LLMs to aid Debian de‐
       velopment, in any extent.

       Essentially,  the  idea  of this tool is to gather some pieces of Debian-specific knowledge, combine them
       together in a prompt, and then send them all to the LLM.  This tool provides convenient functionality for
       automatically retrieving information from BTS, buildd, Debian Policy, system manual pages, tldr  manuals,
       Debian  Developer  References, etc.  It also provides convenient wrappers for external tools such as git,
       where debgpt can automatically generate the git commit message and commit the changes for you.

       This tool supports multiple frontends, including OpenAI and ZMQ.  The ZMQ frontend/backend  are  provided
       in this tool to make it self-contained.

OPTIONS

       -h, --help
              show this help message and exit

       --cmd CMD
              add the command line output to the prompt

       TODO: finish CLI redesign first.  Then add all cmd options here.

FRONTENDS

       The  tool  currently  have  three  frontend implementations: dryrun, openai, and zmq.  They are specified
       through the -F | --frontend argument.

       • openai: Connects with a OpenAI API-compatible server.  For instance, by  specifying  --openai_base_url,
         you can switch to a different service provider than the default OpenAI API server.

       • zmq: Connects with the built-in ZMQ backend.  The ZMQ backend is provided for self-hosted LLM inference
         server.  This implementation is very light weight, and not compatible with the OpenAI API.  To use this
         frontend, you may need to set up a corresponding ZMQ backend.

       • dryrun: Fake frontend that does nothing.  Instead, we will simply print the generated initial prompt to
         the  screen,  so  the user can can copy it, and paste into web-based LLMs, including but not limited to
         ChatGPT (OpenAI), Claude (Anthropic), Bard (google), Gemini (google), HuggingChat  (HuggingFace),  Per‐
         plexity AI, etc.  This frontend does not need to connect with any backend.

       DISCLAIMER:  Unless  you  connect to a self-hosted LLM Inference backend, we are uncertain how the third-
       party API servers will handle the data you created.  Please refer their corresponding user agreements be‐
       fore adopting one of them.  Be aware of such risks, and refrain  from  sending  confidential  information
       such like paid API keys to LLM.

CONFIGURATION

       By default, the configuration file is placed at $HOME/.debgpt/config.toml.  Use debgpt genconfig or debg‐
       pt  config.toml  to  generate a config template.  This configuration file should not be installed system-
       wide because users may need to fill in secrets like paid API keys.

PROMPT ENGINEERING

       When you chat with LLM, note that the way you ask a question significant impacts the quality of  the  re‐
       sults you will get.  make sure to provide as much information as possible.  The following are some refer‐
       ences on this topic:

       1. OpenAI’s Guide https://platform.openai.com/docs/guides/prompt-engineering

EXAMPLES

       The following examples are roughly organized in the order of complexity of command line.

   Ex1. General Chat
       When  no  arguments  are given, debgpt degenerates into a general terminal chatting client with LLM back‐
       ends.  Use debgpt -h to see detailed usage.

              debgpt

       If you want to quit (-Q) after receiving the first response from LLM regarding the question (-A):

              debgpt -Q -A "who are you?"

       After each session, the chatting history will be saved in ~/.debgpt as a json file in a unique name.  You
       can use debgpt replay <file_name> to replay the history.

       During the interactive session, you can use /save path.txt to save the last LLM response to the specified
       file.  You can also use /reset to clear the context.

   Ex2. BTS / Buildd Query
       Ask LLM to summarize the BTS page for src:pytorch.

              debgpt -HQ --bts src:pytorch -A :summary_table
              debgpt -HQ --bts 1056388 -A :summary

       Lookup the build status for package glibc and summarize as a table.

              debgpt -HQ --buildd glibc -A :summary_table

       When the argument to -A/--ask is a tag starting with a colon sign :, such as :summary, it will  be  auto‐
       matically replaced into a default question template.  Use debgpt -A : to lookup available templates.

       The  -H  argument  will  skip printing the first prompt generated by debgpt, because it is typically very
       lengthy, and people may not want to read it.

   Ex3. Debian Policy and Developer References
       Load a section of debian policy document, such as section “4.6”, and ask a question

              debgpt -H --policy 7.2 -A "what is the difference between Depends: and Pre-Depends: ?"
              debgpt -H --devref 5.5 -A :summary

   Ex4. Man and TLDR Manuals
       Load the debhelper manpage and ask it to extract a part of it.

              debgpt -HQ --man debhelper-compat-upgrade-checklist -A "what's the change between compat 13 and compat 14?"
              debgpt -HQ --tldr curl --cmd 'curl -h' -A "download https://localhost/bigfile.iso to /tmp/workspace, in silent mode"

   Ex5. Composition of Various Information Sources
       We can add code file and Debian Policy simultaneously.  The combination is actually  very  flexible,  and
       you  can  put  anything in the prompt.  In the following example, we put the debian/control file from the
       PyTorch package, as well as the Debian Policy section 7.4, and asks the LLM to explain some details:

              debgpt -H -f pytorch/debian/control --policy 7.4 -A "Explain what Conflicts+Replaces means in pytorch/debian/control based on the provided policy document"

       Similarly, we can also let LLM read the Policy section 4.9.1, and ask it to write some code:

              debgpt -H -f pytorch/debian/rules --policy 4.9.1 -A "Implement the support for the 'nocheck' tag based on the example provided in the policy document."

   Ex6. External Command line
       Being able to pipe the inputs and outputs among different programs is one of the reasons why I  love  the
       UNIX philosophy.

       For  example,  we can let debgpt read the command line outputs of apt, and summarize the upgradable pack‐
       ages for us:

              debgpt -HQ --cmd 'apt list --upgradable' -A 'Briefly summarize the upgradable packages. You can categorize these packages.' -F openai --openai_model 'gpt-3.5-turbo-16k'

       And we can also ask LLM to automatically generate a git commit message for you  based  on  the  currently
       staged changes:

              debgpt -HQ --cmd 'git diff --staged' -A 'Briefly describe the change as a git commit message.'

       This looks interesting, right?  In the next example, we have something even more convenient!

   Ex7. Git Wrapper
       Let LLM automatically generate the git commit message, and call git to commit it:

              debgpt git commit

   Ex7. Fortune
       Let LLM tell you a fortune:

              debgpt -T 1.0 fortune :joke
              debgpt -T 1.0 fortune :math

       Use debgpt fortune : to lookup available tags.  Or you can just specify the type of fortune you want:

              debgpt -T 1.0 fortune 'tell me something very funny about linux'

       We  need to raise the temperature (-T) to 1.0 because otherwise it leads to less randomness, and LLM will
       tend to say the same thing every time.

   Ex8. File-Specific Questions
       Let LLM explain the code debgpt/llm.py:

              debgpt -H -f debgpt/llm.py -A :explain

       Let LLM explain the purpose of the contents in a file:

              debgpt -H -f pyproject.toml -A :what

       You can also specify the line range in a special grammar for -f/--file:

              debgpt -H -f pyproject.toml:3-10 -A :what  # select the [3,10) lines
              debgpt -H -f pyproject.toml:-10 -A :what   # select from beginning to 10th (excluding 10th)
              debgpt -H -f pyproject.toml:3- -A :what  # select from 3th line (including) to end of file

       Mimicking licensecheck:

              debgpt -H -f debgpt/llm.py -A :licensecheck

   Ex9. Read Arbitrary HTML
       Make the mailing list long story short:

              debgpt -H --html 'https://lists.debian.org/debian-project/2023/12/msg00029.html' -A :summary

       Explain the differences among voting options:

              debgpt -H --html 'https://www.debian.org/vote/2022/vote_003' -A :diff --openai_model gpt-3.5-turbo-16k

       In this example, we had to switch to a model supporting a long context (the HTML page has roughly 5k  to‐
       kens).

   Ex99. You Name It
       The  usage of LLM is limited by our imaginations.  I am glad to hear from you if you have more good ideas
       on how we can make LLMs useful for Debian  development:  https://salsa.debian.org/deeplearning-team/debg‐
       pt/-/issues

BACKENDS

   Available Backend Implementations
       This tool provides one backend implementation: zmq.

       • zmq: Only needed when you choose the ZMQ front end for self-hosted LLM inference server.

       If  you  plan  to  use the openai or dryrun frontends, there is no specific hardware requirement.  If you
       would like to self-host the LLM inference backend (ZMQ backend), powerful hardware is required.

   LLM Selections
       The concrete hardware requirement depends on the LLM you would like to use.   A  variety  of  open-access
       LLMs can be found here > https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard Generally, when
       trying  to  do  prompt engineering only, the “instruction-tuned” LLMs and “RL-tuned” (RL is reinforcement
       learning) LLMs are recommended.

       The pretrained (raw) LLMs are not quite useful in this case, as they have not yet gone  through  instruc‐
       tion  tuning, nor reinforcement learning tuning procedure.  These pretrained LLMs will more likely gener‐
       ate garbage and not follow your instructions, or simply repeat your instruction.  We  will  only  revisit
       the  pretrained  LLMs when we plan to start collecting data and fine-tune (e.g., LoRA) a model in the far
       future.

       The following is a list of supported LLMs for self-hosting (this list will be updated when there are  new
       state-of-the-art open-access LLMs available): • .RS 2

       Mistral7B (Mistral-7B-Instruct-v0.2) (default)
              This model requires roughly 15GB of disks space to download.
       • .RS 2

       Mixtral8x7B (Mixtral-8x7B-Instruct-v0.1)
              This  model  is  larger yet more powerful than the default LLM.  In exchange, it poses even higher
              hardware requirements.  It takes roughly 60~100GB disk space (I forgot this  number.   Will  check
              later).

       Different LLMs will pose different hardware requirements.  Please see the “Hardware Requirements” subsec‐
       tion below.

   Hardware Requirements
       By  default,  we  recommend  doing LLM inference in fp16 precision.  If the VRAM (such as CUDA memory) is
       limited, you may also switch to even lower preicisions such as 8bit and 4bit.  For pure CPU inference, we
       only support fp32 precision now.

       Note, Multi-GPU inference is supported by the underlying transformers  library.   If  you  have  multiple
       GPUs, this memory requirement is roughly divided by your number of GPUs.

       Hardware requirements for the Mistral7B LLM:

       • Mistral7B  + fp16 (cuda): 24GB+ VRAM preferred, but needs a 48GB GPU to run all the demos (some of them
         have a context as long as 8k).  Example: Nvidia RTX A5000, Nvidia RTX 4090.

       • Mistral7B + 8bit (cuda): 12GB+ VRAM at minimum, but 24GB+ preferred so you can run all demos.

       • Mistral7B + 4bit (cuda): 6GB+ VRAM at minimum but 12GB+ preferred so you can run all  demos.   Example:
         Nvidia RTX 4070 (mobile) 8GB.

       • Mistral7B  +  fp32  (cpu): Requires 64GB+ of RAM, but a CPU is 100~400 times slower than a GPU for this
         workload and thus not recommended.

       Hardware requirement for the Mixtral8x7B LLM:

       • Mixtral8x7B + fp16 (cuda): 90GB+ VRAM.

       • Mixtral8x7B + 8bit (cuda): 45GB+ VRAM.

       • Mixtral8x7B + 4bit (cuda): 23GB+ VRAM, but in order to make it work with long context such  as  8k  to‐
         kens, you still need 2x 48GB GPUs in 4bit precision.

       See https://huggingface.co/blog/mixtral for more.

   Usage of the ZMQ Backend
       If you want to run the default LLM with different precisions:

              debgpt backend --max_new_tokens=1024 --device cuda --precision fp16
              debgpt backend --max_new_tokens=1024 --device cuda --precision bf16
              debgpt backend --max_new_tokens=1024 --device cuda --precision 8bit
              debgpt backend --max_new_tokens=1024 --device cuda --precision 4bit

       The only supported precision on CPU is fp32 (for now).  If you want to fall back to CPU computation (very
       slow):

              debgpt backend --max_new_tokens=1024 --device cpu --precision fp32

       If you want to run a different LLM, such as Mixtral8x7B than the default Mistral7B:

              debgpt backend --max_new_tokens=1024 --device cuda --precision 4bit --llm Mixtral8x7B

       The  argument  --max_new_tokens  does  not matter much and you can adjust it (it is the maximum length of
       each llm reply).  You can adjust it as wish.

SETUP AND INSTALL

       FIXME: add optional (backend) dependencies in pyproject.toml

       This tool can be installed from source via the command “pip3 install .”.  By default, it will  only  pull
       the  dependencies  needed  to  run the OpenAI and the ZMQ frontends.  The dependencies of the ZMQ backend
       (i.e., self-hosted LLM inference) needs to be satisfied manually for now, using  tools  like  pip,  venv,
       conda, mamba, etc.

       The  additional dependencies needed to run the LLM backend are: numpy, pytorch, pyzmq, scipy, accelerate,
       bitsandbytes, tokenizers, transformers.

       The additional dependencies needed to run the tests are: pytest.

TODO

       The following is the current TODO List.Some ideas might be a little bit far away.

        1. debgpt.backend error handling ...  illegal input format, overlength, CUDA OOM, etc.

        2. debgpt.llm tune llm parameters like temperature.

        3. implement very simple CoT https://arxiv.org/pdf/2205.11916.pdf

        4. add perplexity API https://www.perplexity.ai

        5. https://github.com/openai/chatgpt-retrieval-plugin

        6. implement --archwiki --gentoowiki --debianwiki --fedorawiki --wikipedia (although the  LLM  have  al‐
           ready read the wikipedia dump many times)

        7. analyze sbuild buildlog

        8. analyze udd, ddpo, contributors, nm

        9. organize argparse with argument groups

       10. How can LLM help CPython transition?  failing tests, API changes, etc.

       11. What else can we do about the Debian patching workflow?  adding patch description?

       12. Uscan?  Upstream information?

       13. find upstream bug that matches debian bug (bug triage)

       14. connect with debian codesearch API https://codesearch.debian.net/faq

       15. Let  LLM  imitate  Janitor  (https://wiki.debian.org/Janitor),  and possibly do some more complicated
           things

       16. Extend Lintian with LLM for complicated checks?

       17. Let LLM do mentoring (lists.debian.org/debian-mentors) e.g., reviewing a .dsc package.  This is  very
           difficult given limited context length.  Maybe LLMs are not yet smart enough to do this.

       18. Apart from the str type, the frontend supports other return types like List or Dict (for advanced us‐
           age such as in-context learning) are possible (see debgpt/frontend.py :: ZMQFrontend.query, but those
           are not explored yet.

       19. The  current implementation stays at prompt-engineering an existing Chatting LLM with debian-specific
           documents, like debian-policy, debian developer references, and some man pages.  In  the  future,  we
           may  want  to  explore how we can use larger datasets like Salsa dump, Debian mailing list dump, etc.
           LoRA or RAG or any new methods are to be investegated with the  datasets.   Also  see  follow-ups  at
           https://lists.debian.org/debian-project/2023/12/msg00028.html

       20. Should  we  really  train  or fine-tune a model?  How do we organize the data for RLHF or instruction
           tuning?

       21. There are other possible backends like https://github.com/ggerganov/llama.cpp which allows  inference
           on CPUs (even laptops).  transformers itself also supports 8bit and 4bit inference with bitsandbytes.

LICENSE

       Copyright (C) 2024 Mo Zhou <lumin@debian.org>; MIT/Expat License

AUTHORS

       Copyright (C) 2024 Mo Zhou <lumin@debian.org>; MIT License..

                                                                                                       DebGPT(1)