Culture & Research Philosophy

Our interdisciplinary team works at the foundational models of computer vision, natural language processing, and multimodal learning:

  • Experimental: Conduct reproducible experiments that advance fundamental understanding.
  • Computational: Leverage algorithms, models, and coding expertise to tackle challenging questions.

Lab Entry

We welcome students from all disciplines with strong curiosity and a passion for rigorous, original research. We commonly submit our work to conferences including ACL, NIPS, CVPR and ACM CHI.

Topics

We focus topics around foundatonal deep learning models used in real-world applications:

  1. Facial presentation attacks
    We are researching methods to detect and mitigate the risks associated with facial presentation attacks, particularly the modern ones involving AI-generated synthethic faces which are highly realistic. Our work includes developing robust algorithms that work under different edge case scenarios, creating presentation attack datasets, understanding quality of training data, and utlimately leading to secure facial recognition systems that will be used in our everday life.
  2. Mixed speech recognition
    We are researching methods to improve the performance of speech recognition systems in challenging scenarios, such as when speakers mix languages in one sentence (code-switching) or when there is a lot of background noise. Our work includes developing robust algorithms that can handle these edge cases, creating datasets for training and evaluation, and ultimately leading to more accurate and reliable speech recognition systems that can be used in real-world applications.
  3. Human-AI Interaction with Mental Health and Human Cultures and Biases
    Our lab has been working on understanding humans, particularly in the area of biases, culture, cognition, emotion, behavior. Through that understanding, we are interested in how to design AI systems that are more intuitive and user-friendly and can effectively collaborate with humans for stronger outcomes. Example applications that we have been working for a long tiem include mental health related applications and tutoring related applications.
  4. Medical VQA
    Visual Question Answering (VQA) in the medical domain is a challenging task that requires a deep understanding of both visual data and medical knowledge. We are developing advanced VQA systems that can assist healthcare professionals by providing accurate answers to complex medical questions based on medical images such as X-rays, MRIs, and CT scans. Our research focuses on integrating multimodal data and leveraging domain-specific knowledge to improve the accuracy and reliability of these systems.
  5. BCI spellers
    Our lab has been developing BCI spellers for almost a decade. We worked with both SSVEP paragadigm, P300 paradigm, and hybrid paradigms. We have performed many iterative improvements on the system, including optimizing the visual stimuli, improving the signal processing and classification algorithms, and enhancing the user interface. The ongoing research focus is to improve the information transfer rate (ITR) and usability of the BCI spellers with new technologies such as deep learning and transfer learning.
  6. Raman spectroscopy for non-invasive glucose monitoring
    Diabetes is a major health issue worldwide, and non-invasive glucose monitoring has the potential to greatly improve the quality of life for diabetic patients. Our lab has been working on developing a non-invasive glucose monitoring system using Raman spectroscopy. Our goal is to design and develop a portable Raman system that can help elderly and patients for glucose monitoring.