This position is open to undergraduate and MS students. Choose your corresponding research interest for the question “Which of the following match your research interests?” in the google form. Each team will be led by a Ph.D. student.
Applications of Neurosymbolic Methods
Project Overview
This project explores how neurosymbolic methods, probabilistic inference, and learning-based optimization can be applied to real-world domains. Students will work on combining neural perception with symbolic structure, uncertainty modeling, and interpretable reasoning to address applied problems in vision, human-AI interaction, healthcare, energy systems, and multimodal understanding.
Research Focus Areas
1. Computer Vision and Video Understanding
Focus: Building models that understand visual environments, recognize human activities, and reason over temporal and relational structure in videos.
Key Problems:
- Activity recognition in egocentric and third-person video
- Procedural task understanding in complex environments
- Object detection and tracking in dynamic scenes
- Scene graph construction and relational reasoning
Methods:
- Deep learning for visual recognition and feature extraction
- Neurosymbolic models for interpretable activity and event reasoning
- Probabilistic temporal models for multi-step predictions
- Graph-based representations for structured scene understanding
Example Work:
- CaptainCook4D: Egocentric 4D dataset for procedural task understanding (NeurIPS’24, DMLR’23)
- Explainable Activity Recognition: Interpretable models for human activity understanding (TiiS’23)
- Neurosymbolic Models for Activity Recognition and Image Classification: Deep dependency networks for multi-label classification in images and videoss (AISTATS’24)
2. Human-AI Interaction and Task Guidance
Focus: Developing systems that provide real-time assistance for physical and cognitive tasks through perception, prediction, and symbolic task knowledge.
Key Problems:
- Real-time task guidance in augmented reality
- Predictive assistance for multi-step procedural workflows
- Error detection and recovery in human activities
- Adaptive instruction generation
Methods:
- Neurosymbolic models integrating perception with symbolic task graphs
- Probabilistic inference for action prediction and intent estimation
- Multimodal reasoning over visual, language, and contextual signals
Example Work:
- Predictive Task Guidance in AR: Real-time guidance systems for complex tasks (IEEE VR’24)
- CaptainCook4D: Egocentric 4D dataset for procedural task understanding (NeurIPS’24, DMLR’23)
- Real-time AR Guidance Systems: Built systems accelerating task completion (DARPA PTG)
3. Medical and Healthcare Applications
Focus: Applying AI to clinical decision support, diagnostics, and health systems optimization.
Key Problems:
- Disease diagnosis and prognosis
- Treatment planning and personalization
- Medical image analysis
- Healthcare resource optimization
Methods:
- Probabilistic models for uncertainty quantification and risk estimation
- Explainable AI for clinical decision support
- Graph-based patient modeling and knowledge graph inference
- Learning-based models for diagnostic and prognostic prediction
Applications:
- Disease spread modeling and intervention planning
- Personalized treatment and risk-based stratification
- Explainable medical image analysis
- Hospital resource allocation and scheduling
4. Energy Systems and Infrastructure
Focus: Optimizing and forecasting behavior in large-scale infrastructure systems.
Key Problems:
- Power grid optimization and stability analysis
- Smart grid management and demand-side forecasting
- Maintenance scheduling in large infrastructure networks
- Integration of renewable energy sources
Methods:
- Graph neural networks for grid and network modeling
- Reinforcement learning for dynamic resource allocation
- Probabilistic models for forecasting and reliability analysis
- Combinatorial optimization for scheduling and planning
5. Natural Language Processing and Reasoning
Focus: Developing systems that combine language, vision, and structured knowledge for reasoning and decision making.
Key Problems:
- Multimodal reasoning across text, images, and video
- Knowledge-grounded question answering
- Language-guided planning and action prediction
- Document understanding and structured information extraction
Methods:
- Neurosymbolic models integrating language with symbolic knowledge bases
- Probabilistic reasoning for ambiguity resolution
- Graph-based representations for knowledge and relational structure
- Deep learning for language understanding and grounding tasks
Applications:
- Visual question answering and multimodal inference
- Instruction following and task planning
- Knowledge base reasoning and retrieval
- Multimodal document and scene interpretation
How To Apply
Please submit your details using the Google Form.
Note: Select “Applications in Multimodal Reasoning” or “Applications in Computer Vision and Video Understanding” or choose “Other” and specify your interests.
Selected students may be invited for a brief meeting to discuss fit and potential directions.
For general lab information and university details, see the main hiring page. ← Back to Main Hiring Page