Skip to content
Snippets Groups Projects
Commit 4623966b authored by abir.chebbi's avatar abir.chebbi
Browse files

readme

parent 64a49f18
Branches
No related tags found
No related merge requests found
......@@ -4,6 +4,7 @@
1. AWS CLI: Ensure AWS CLI is installed and configured on your laptop(refer to the setup guide provided in Session 1).
2. Ensure python is installed: python 3.8 or higher.
3. Install required python libraries listed in the 'requirements.txt':
`pip3 install -r requirements.txt`
......@@ -11,18 +12,22 @@
### Step 1: Object storage Creation
Create an S3 bucket and upload a few PDF files by running:
`python create-S3-and-put-docs.py --bucket_name [YourBucketName] --local_path [PathToYourPDFFiles]`
Where:
`--bucket_name`: The name for the new S3 bucket to be created.
`--local_path`: The local directory path where the PDF files are stored.
- **--bucket_name**: The name for the new S3 bucket to be created.
- **--local_path**: The local directory path where the PDF files are stored.
### Step 2: Vector Store Creation
Create a vector database for storing embeddings by running:
`python create-vector-db.py --collection_name [Name_of_colletion] --IAM_user [YourIAM_User]`
Where:
`--collection_name`: Name of the collection that you want to create to store embeddings.
`--IAM_USER` : For example for group 14 the IAM USER = master-group-14
- **--collection_name**: Name of the collection that you want to create to store embeddings.
- **--IAM_USER** : For example for group 14 the IAM USER = master-group-14
This script performs the following actions:
......@@ -35,12 +40,14 @@ This script performs the following actions:
After setting up the S3 bucket and Vector Store, we could process PDF files to generate and store embeddings in the vector database.
Run:
`python main.py --bucket_name [YourBucketName] --endpoint [YourVectorDBEndpoint]`
Where:
`--bucket_name`: The name of the S3 bucket containing the PDF files.
`--endpoint`: Endpoint for the vector database.
`--index_name`: The index_name where to store the embeddings in the collection.
- **--bucket_name**: The name of the S3 bucket containing the PDF files.
- **--endpoint**: Endpoint for the vector database.
- **--index_name**: The index_name where to store the embeddings in the collection.
The main.py script will:
1. Download PDF files from the S3 bucket.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment