CompletedPythonGradioTensorFlow

Acoustic Artistry

An AI-powered application that converts your voice descriptions into stunning images using Stable Diffusion and speech recognition technology.

Timeline

3 weeks

Role

ML Engineer

Team

Solo

Status

Completed

Source Code

Technology Stack

Python

Gradio

TensorFlow

Key Challenges

Audio feature extraction
Prompt engineering for Stable Diffusion
Mapping audio characteristics to visual elements
GPU memory management
Model inference optimization

Key Learnings

Stable Diffusion pipeline
Voice to text processing using SpeechRecognition and Google Speech API
Gradio interface building
Using Hugging Face Diffusers
End to end integration of multimodal AI systems

Acoustic Artistry: Audio to Album Art

Overview

Acoustic Artistry is an AI powered voice to image generator that converts spoken descriptions into visually rich images using Stable Diffusion. It combines speech recognition with generative AI to transform natural language prompts into high quality artwork through an interactive Gradio web interface.

Key Features

Voice to Image Conversion: Record voice input which is converted to text using speech recognition before generating images.
Text Prompt Support: Alternative text input option for direct prompt based image generation.
Stable Diffusion v1.5 Integration: Generates high quality AI images using diffusion models via Hugging Face Diffusers.
Customizable Image Settings: Adjustable image dimensions and generation parameters for improved output control.
Interactive Gradio Interface: Responsive web UI for seamless recording, prompt editing and image preview.
Real Time Processing: Fast speech to text conversion and optimized image inference pipeline.

Why I Built This

I wanted to explore the interaction between voice interfaces and generative AI. The idea was to reduce friction between imagination and creation by allowing users to simply describe an idea aloud and instantly visualize it through AI generated imagery.

Future Plans

Real time streaming voice input instead of single recordings
Advanced prompt enhancement using LLM based refinement
Style presets and reference image conditioning
Cloud deployment with GPU acceleration
Mobile optimized interface for broader accessibility

Previous Project

SysMon

Next Project

Rock Paper Scissors AI

Related Projects

Rock Paper Scissors AI

Completed

An intelligent Rock Paper Scissors game that learns your playing patterns using LSTM neural networks and tries to predict your next move!

PythonTensorFlowGradio

CapVid

Completed

AI-powered video captioning app that auto-generates and burns subtitles into videos using Whisper speech recognition

ReactFlaskPython+3

View All Projects