From 4623966bc7ff030466cdea36d206fb2900013aab Mon Sep 17 00:00:00 2001 From: "abir.chebbi" <abir.chebbi@hes-so.ch> Date: Thu, 12 Sep 2024 14:36:36 +0200 Subject: [PATCH] readme --- README.md | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 1a1c3bf..5fa4953 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,7 @@ 1. AWS CLI: Ensure AWS CLI is installed and configured on your laptop(refer to the setup guide provided in Session 1). 2. Ensure python is installed: python 3.8 or higher. 3. Install required python libraries listed in the 'requirements.txt': + `pip3 install -r requirements.txt` @@ -11,18 +12,22 @@ ### Step 1: Object storage Creation Create an S3 bucket and upload a few PDF files by running: + `python create-S3-and-put-docs.py --bucket_name [YourBucketName] --local_path [PathToYourPDFFiles]` + Where: -`--bucket_name`: The name for the new S3 bucket to be created. -`--local_path`: The local directory path where the PDF files are stored. +- **--bucket_name**: The name for the new S3 bucket to be created. +- **--local_path**: The local directory path where the PDF files are stored. ### Step 2: Vector Store Creation Create a vector database for storing embeddings by running: + `python create-vector-db.py --collection_name [Name_of_colletion] --IAM_user [YourIAM_User]` + Where: -`--collection_name`: Name of the collection that you want to create to store embeddings. -`--IAM_USER` : For example for group 14 the IAM USER = master-group-14 +- **--collection_name**: Name of the collection that you want to create to store embeddings. +- **--IAM_USER** : For example for group 14 the IAM USER = master-group-14 This script performs the following actions: @@ -35,12 +40,14 @@ This script performs the following actions: After setting up the S3 bucket and Vector Store, we could process PDF files to generate and store embeddings in the vector database. Run: + `python main.py --bucket_name [YourBucketName] --endpoint [YourVectorDBEndpoint]` Where: -`--bucket_name`: The name of the S3 bucket containing the PDF files. -`--endpoint`: Endpoint for the vector database. -`--index_name`: The index_name where to store the embeddings in the collection. + +- **--bucket_name**: The name of the S3 bucket containing the PDF files. +- **--endpoint**: Endpoint for the vector database. +- **--index_name**: The index_name where to store the embeddings in the collection. The main.py script will: 1. Download PDF files from the S3 bucket. -- GitLab