Culture & Research Philosophy
Our interdisciplinary team works at the foundational models of computer vision, natural language processing, and multimodal learning:
- Experimental: Conduct reproducible experiments that advance fundamental understanding.
- Computational: Leverage algorithms, models, and coding expertise to tackle challenging questions.
Lab Entry
We welcome students from all disciplines with strong curiosity and a passion for rigorous, original research. We commonly submit our work to conferences including ACL, NIPS, CVPR and ACM CHI.
- Facial Recognition: Developing accurate and fair facial recognition systems remains a significant challenge. Key thesis problems include addressing bias in training data, liveness detection, mitigating presentation attacks, improving recognition accuracy under varying lighting and occlusion conditions, and enhancing privacy-preserving techniques.
- OCR (Optical Character Recognition): Extracting text from images and documents is essential for digitization and information retrieval. Key thesis problems include improving accuracy on complex layouts and fonts, handling multi-language documents, and developing efficient algorithms for real-time applications.
- Small Language Models (SLMs): While large language models (LLMs) have garnered significant attention, SLMs are crucial for resource-constrained environments. Key thesis problems include optimizing model architectures for efficiency, developing effective training techniques with limited data, and ensuring robust performance across diverse tasks and domains.
- Speech2Text: Converting spoken language into text is crucial for accessibility and human-computer interaction. Key thesis problems include improving accuracy in noisy environments, handling diverse accents and dialects, and developing real-time transcription systems.
- Text2Speech and Voice Cloning: Generating natural-sounding speech from text has many applications, but challenges remain in prosody, emotion, and voice diversity. Key thesis problems include improving the naturalness and expressiveness of generated speech, developing robust voice cloning techniques, and addressing ethical concerns around consent and misuse.
- Voice Sentiment: Analyzing sentiment from voice data is important for customer service and mental health applications. Key thesis problems include improving accuracy across different languages and cultures, handling background noise, and developing real-time sentiment analysis systems.
