An ML-powered web app for stock image semantic search.
Powered by:
See the code here
To start, clone the repo:
git clone https://github.com/t-gibson/stock.git
This repo includes steps to construct AWS infrastructure for hosting this web app. It can also be run on your local machine. The steps for setting up the AWS infra are below. They will require that you have set up the AWS cli on your system and have installed Terraform.
Construct the AWS infrastructure using Terraform. You will need to pass in the name of your desired AWS key pair as a variable.
terraform apply
SSH into your EC2 instance and clone the codebase there.
ssh -i <path/to/ssh_private_key> ubuntu@$(terraform output public_dns)
git clone https://github.com/t-gibson/stock.git
Install the stock
cli app.
Note: this app requires python>=3.7. I recommend using a virtual env or conda.
pip install -r requirements.txt
Populate the .env
template file. This will save you keystrokes at the command line.
Download the data that we will process. We abide by the limits of the Pexels API.
So, if you attempt to download too many things stock
will throw an exception or else
the API will just return not as many images as you will expect.
stock download --num-results 50 <space separated list of image categories to query>
To have a search application that has half-decent results you will need to have downloaded info on a meaningful number of photos. However, don’t abuse the Pexels API. I recommend running the below variant of the command and schedule it to re-run regularly using a crontab.
stock download \
--num-results 50 \
--query-page-logs <file-to-store-interim-results> \
<space separate list of image categories>
Run the indexing application.
stock index
After that you are free to run the search application. The streamlit app
will be running on port 8501
.
stock search
To dig deeper into the neural search path I could consider multi-modal search capabilities. This should be a cinch with the multi-modal search capapabilities that are on their way for Jina.
I only dipped my toe into hosting an app on AWS. There isn’t many smarts applied to the infrastructure that I set up. Could I attempt to wrap up the search capability as a serverless application?