MyPaper

  1. 3D Point Cloud Processing
  2. 多标签图像分类
  3. Training-Free Open-Vocabulary Semantic Segmentation
  4. Remote Sensing
  5. Training Open-Vocabulary Semantic Segmentation
  6. Zero-Shot Open-Vocabulary Semantic Segmentation
  7. Few-Shot Open-Vocabulary Semantic Segmentation
  8. Supervised Semantic Segmentation
  9. Weakly Supervised Semantic Segmentation
  10. Semi-Supervised Semantic Segmentation
  11. Unsupervised Semantic Segmentation

暂存

  1. [2025 NeurIPS] Open-Vocabulary Part Segmentation via Progressive and Boundary-Aware Strategy[code]
  2. [2025 NeurIPS] Towards Unsupervised Domain Bridging via Image Degradation in Semantic Segmentation[code]
  3. [2025 NeurIPS] OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation[code]
  4. [2025 NeurIPS] Continual Gaussian Mixture Distribution Modeling for Class Incremental Semantic Segmentation[code]
  5. [2025 NeurIPS] Instance-Level Composed Image Retrieval[paper][code]
  6. [2025 NeurIPS] Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMs[paper][code]
  7. [2025 NeurIPS] Dual-Space Semantic Synergy Distillation for Continual Learning of Unlabeled Streams

相关领域

  1. [2025 arXiv] From Pixels to Words – Towards Native Vision-Language Primitives at Scale [paper] [code]
  2. [2025 arXiv] Exploring Cross-Modal Flows for Few-Shot Learning [paper]
  3. [2025 arXiv] AnyUp: Universal Feature Upsampling [paper]
  4. [2025 arXiv] CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection [paper]
  5. [2025 arXiv] Data or Language Supervision: What Makes CLIP Better than DINO? [paper]
  6. [2025 arXiv] VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross Modal Mutual Information Maximization [paper] [code]
  7. [2023 arXiv] Black Box Few-Shot Adaptation for Vision-Language models[paper][code]
  8. [2025 ICLR] Towards Calibrated Deep Clustering Network[paper][code]
  9. [2025 NeurIPS] Test-Time Adaptive Object Detection with Foundation Model [paper] [code]
  10. [2025 arXiv] OVRD: OPEN-VOCABULARY RELATION DINO WITH TEXT-GUIDED SALIENT QUERY SELECTION [paper] [code]
  11. [2025 CVPR] Towards Vision-Language Correspondence without Parallel Data [paper] [code]
  12. [2024 ECCV] Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection [paper] [code]
  13. [2025 NeurIPS] Fuse2Match: Training-Free Fusion of Flow,Diffusion, and Contrastive Models for Zero-Shot Semantic Matching[paper][code]
  14. [2025 NeurIPS] SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning[paper]
  15. [2025 NeurIPS] OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts[paper]
  16. [2025 arXiv] Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling[paper][code]

3D Point Cloud Processing

OOD Detection

  1. [2025-ICCV] Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation [paper]

Segmentation&&Recognition

  1. [2024-CVPR] MaskClustering- View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation [paper] [code]
  2. [2024-CVPR] LangSplat: 3D Language Gaussian Splatting [paper] [code]
  3. [2024-NeurIPS] OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding [paper] [code]
  4. [2025-CVPR] Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model [paper] [code]
  5. [2025-ICLR] MULTIMODALITY HELPS FEW-SHOT 3D POINT CLOUD SEMANTIC SEGMENTATION [paper] [code]
  6. [2025-ICCV] Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds [paper] [code]
  7. [2025-CVPR] LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds [paper] [code]
  8. [2025-ICCV] UPP: Unified Point-Level Prompting for Robust Point Cloud Analysis [paper] [code]
  9. [2025-ICCV] All in One: Visual-Description-Guided Unified Point Cloud Segmentation [paper] [code]
  10. [2025-ICCV] LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes [paper] [code]
  11. [2025-ICCV] WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images [paper]
  12. [2025-arXiv] SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features [paper]
  13. [2025-ICCV] SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining [paper] [code]
  14. [2025-ICCV] COS3D: Collaborative Open-Vocabulary 3D Segmentation [paper] [code]
  15. [2025-CVPR] Cross-Modal 3D Representation with Multi-View Images and Point Clouds [paper]
  16. [2025-CVPR] OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting [paper]
  17. [2025-CVPR] Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration [paper] [code]
  18. [2025-CVPR] Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting [paper]
  19. [2025-arXiv] Is clustering enough for LiDAR instance segmentation? Astate-of-the-art training-free baseline [paper] [code]
  20. [2025-NeurIPS] L2RSI: Cross-view LiDAR-based Place Recognition for Large-scale Urban Scenes via Remote Sensing Imagery [paper] [code]
  21. [2025-NeurIPS] GTR-Loc: Geospatial Text Regularization Assisted Outdoor LiDAR Localization [paper]
  22. [2026-ICLR] Fixinggs: Enhancing 3D Gaussian Splatting Via Training-Free Score Distillation [paper]
  23. [2026-ICLR] Query-Aware Hub Prototype Learning For Few-Shot 3D Point Cloud Segmentation [paper]

Classification&&Test-time Adaptation

  1. [2025-ICCV] Interpretable point cloud classification using multiple instance learning [paper]
  2. [2025-ICCV] Purge-Gate: Backpropagation-Free Test-Time Adaptation For Point Clouds Classification Via Token [paper]
  3. [2025-ICML] SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point Clouds [paper] [code]
  4. [2025-CVPR] Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis [paper] [code]
  5. [2025-CVPR] Purge-Gate: Efficient Backpropagation-Free Test-Time Adaptation for Point Clouds via Token Purging [paper] [code]

Information Retrieval

  1. [2025-ICCV] Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval [paper] [code]

Localization

  1. [2025-ICCV] Partially Matching Submap Helps: Uncertainty Modeling and Propagation for Text to Point Cloud Localization [code]

3D Reconstruction

  1. [2024-ICRA] LidarDM: Generative LiDAR Simulation in a Generated World [paper] [code]
  2. [2025-arXiv] MapAnything: Universal Feed-Forward Metric 3D Reconstruction [paper] [code]
  3. [2025-arXiv] Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey [paper]
  4. [2025-arXiv] IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction [paper]
  5. [2025-arXiv] Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS [paper] [code]
  6. [2025-arXiv] Ref-SAM3D: Bridging SAM3D with Text for Reference 3D Reconstruction [paper]

3D Alignment

  1. [2025-arXiv] Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces [paper]
  2. [2025-arXiv] Robust Cross-modal Alignment Learning for Cross-Scene Spatial Reasoning and Grounding [paper]
  3. [2025-NeurIPS] SCENEFORGE: Enhancing 3D-text alignment with Structured Scene Compositions [paper] [code]

多标签图像分类

  1. [2022 IJCV] Learning to Prompt for Vision-Language Models[paper][code]
  2. [2023 ICCV] PatchCT: Aligning Patch Set and Label Set with Conditional Transport for Multi-Label Image Classification[paper] [code]
  3. [2023 ICCV] Cdul: Clip-driven unsupervised learning for multi-label image classification[paper][code]
  4. [2024 ICML] Language-driven Cross-modal Classifier for Zero-shot Multi-label Image Recognition[paper][code]
  5. [2024 AAAI] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training[paper][code]
  6. [2025 CVPR] SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models[paper][code]
  7. [2025 CVPR] Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification[paper][code]
  8. [2025 CVPR] Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport[paper][code]
  9. [2025 CVPR] Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning[paper][code]
  10. [2025 ICCV] MambaML: Exploring State Space Models for Multi-Label Image Classification[paper]
  11. [2025 ICCV] Category-Specific Selective Feature Enhancement for Long-Tailed Multi-Label Image Classification[paper][code]
  12. [2025 ICCV] More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning[paper][code]
  13. [2025 ICCV] Language-Driven Multi-Label Zero-Shot Learning with Semantic Granularity [paper][code]

Classification

  1. [2022 ECCV] Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification [paper][code]
  2. [2024 ICLR] A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation[paper][code]
  3. [2025 IEEE] Modeling Cross-Modal Semantic Transformations from Coarse to Fine in CLIP [paper]
  4. [2025 ICML] From Local Details to Global Context:Advancing Vision-Language Models with Attention-Based Selection[paper][code]
  5. [2026 ICLR] GRAPH-REFINEDREPRESENTATIONLEARNINGFOR FEW-SHOTCLASSIFICATIONVIACLIPADAPTATION [paper]
  6. [2026 ICLR] GRAPH-REFINEDREPRESENTATIONLEARNINGFOR FEW-SHOTCLASSIFICATIONVIACLIPADAPTATION[paper]

Training-Free Open-Vocabulary Semantic Segmentation

  1. [2024 CVPR] Clip-diy: Clip dense inference yields open-vocabulary semantic segmentation for-free [paper][code]
  2. [2024 CVPR] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation [paper] [code]
  3. [2024 ECCV] Diffusion Models for Open-Vocabulary Segmentation [paper] [code]
  4. [2024 ECCV] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference [paper] [code]
  5. [2024 ECCV] SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference [paper] [code]
  6. [2024 ECCV] Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation [paper] [code]
  7. [2024 ECCV] Proxyclip: Proxy attention improves clip for open-vocabulary segmentation [paper] [code]
  8. [2024 ECCV] Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation [paper] [code]
  9. [2024 arXiv] CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation [paper] [code]
  10. [2024 CVPR] Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models [paper] [code]
  11. [2024 CVPR] Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation [paper]
  12. [2024 ECCV] In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation [paper] [code]
  13. [2025 CVPR] LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation [paper] [code]
  14. [2025 CVPR] ResCLIP: Residual Attention for Training-free Dense Vision-language Inference [paper] [code]
  15. [2025 CVPR] Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation [paper] [code]
  16. [2025 CVPR] Cheb-GR: Rethinking k-nearest neighbor search in Re-ranking for Person Re-identification [paper] [code]
  17. [2025 CVPR] ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements [paper] [code]
  18. [2025 CVPR] Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval [paper] [code]
  19. [2025 ICCV] Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation [paper] [code]
  20. [2025 ICCV] E-SAM: Training-Free Segment Every Entity Model [paper]
  21. [2025 ICCV] ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation [paper] [code]
  22. [2025 ICCV] CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation [paper] [code]
  23. [2025 ICCV] CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting [paper] [code]
  24. [2025 ICCV] Auto-Vocabulary Semantic Segmentation [paper]
  25. [2025 ICCV] Training-Free Class Purification for Open-Vocabulary Semantic Segmentation [paper] [code]
  26. [2025 ICCV] DIH-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation
  27. [2025 ICCV] Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation[paper][code]
  28. [2025 ICCV] Plug-in Feedback Self-adaptive Attention in CLIP for Training-free Open-Vocabulary Segmentation [paper][code]
  29. [2025 ICCV] Test-Time Retrieval-Augmented Adaptation for Vision-Language Models [paper] [code]
  30. [2025 ICCV] Images as Noisy Labels: Unleashing the Potential of the Diffusion Model for Open-Vocabulary Semantic Segmentation
  31. [2025 AAAI] Training-free Open-Vocabulary Semantic Segmentation via Diverse Prototype Construction and Sub-region Matching [paper]
  32. [2025 arXiv] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation [paper] [code]
  33. [2025 arXiv] Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation [paper] [code]
  34. [2025 arXiv] FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation [paper] [code]
  35. [2025 arXiv] TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models [paper] [code]
  36. [2025 arXiv] A Survey on Training-free Open-Vocabulary Semantic Segmentation [paper]
  37. [2025 arXiv] No time to train! Training-Free Reference-Based Instance Segmentation [paper] [code]
  38. [2024 arXiv] There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks [paper] [code]
  39. [2025 arXiv] Rethinking the Global Knowledge of CLIP in Training-Free Open-Vocabulary Semantic Segmentation [paper]
  40. [2025 arXiv] Seg2Track-SAM2: SAM2-based Multi-object Tracking and Segmentation for Zero-shot Generalization [paper] [code]
  41. [2024 arXiv] FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [paper] [code]
  42. [2024 arXiv] TAG: Guidance-free Open-Vocabulary Semantic Segmentation [paper] [code]
  43. [2025 arXiv] What Holds Back Open-Vocabulary Segmentation? [paper]
  44. [2025 NeurIPS] Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers [paper] [code]
  45. [2025 arXiv] SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP [paper] [code]
  46. [2025 arXiv] Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP [paper] [code]
  47. [2026 ICLR] IMPROVING VISUAL DISCRIMINABILITY OF CLIP FOR TRAINING-FREE OPEN-VOCABULARY SEMANTIC SEGMENTATION [paper]
  48. [2026 ICLR] BEYOND OPEN-WORLD: COSRA, A TRAINING-FREE SELF-REFINING APPROACH TO OPEN-ENDED OBJECT DETECTION [paper]
  49. [2025 NeurIPS] OPMapper: Enhancing Open-Vocabulary Semantic Segmentation with Multi-Guidance Information[paper]
  50. [2025 CVPR] Effective SAM Combination for Open-Vocabulary Semantic Segmentation[paper]
  51. [2026 AAAI] Target Refocusing via Attention Redistribution for Open-Vocabulary Semantic Segmentation: An Explainability Perspective[paper][code]

Remote Sensing

  1. [2025 arXiv] DynamicEarth: How Far are We from Open-Vocabulary Change Detection? [paper] [code]
  2. [2025 ICCV] SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation. [paper] [code]
  3. [2025 CVPR] SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images. [paper] [code]
  4. [2025 Arxiv] RSKT-Seg: Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing [paper] [code]
  5. [2025 Arxiv] AlignCLIP: Self-Guided Alignment for Remote Sensing Open-Vocabulary Semantic Segmentation [paper] [code]
  6. [2025 Arxiv] RSVG-ZeroOV: Exploring a Training-Free Framework for Zero-Shot Open-Vocabulary Visual Grounding in Remote Sensing Images [paper]
  7. [2025 Arxiv] SegEarth-OV-2: Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images [paper] [code]
  8. [2025 AAAI] GSNet: Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation [paper] [code]
  9. [2025 CVPRW] AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images [paper]
  10. [2025 arXiv] InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object[paper] [code]

  1. [2025 TGRS] A Unified Framework With Multimodal Fine-Tuning for Remote Sensing Semantic Segmentation. [paper] [code]
  2. [2025 ICASSP] Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification. [paper] [code]
  3. [2025 ICCV] Dynamic Dictionary Learning for Remote Sensing Image Segmentation. [paper] [code]
  4. [2025 ICCV] GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks. [paper] [code]
  5. [2025 ICCV] When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning. [paper] [code]
  6. [2025 AAAI] ZoRI: Towards discriminative zero-shot remote sensing instance segmentation. [paper] [code]
  7. [2024 NIPS] Segment Any Change. [paper] [code]
  8. [2025 CVPR] XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? [paper] [code]
  9. [2025 CVPR] Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation. [paper] [code]
  10. [2025 Arxiv] InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition [paper] [code]
  11. [2025 Arxiv] DescribeEarth: Describe Anything for Remote Sensing Images [paper] [code]
  12. [2025 NIPS] GTPBD: A Fine-Grained Global Terraced Parcel and Boundary Dataset [paper] [code]
  13. [2025 Arxiv] RS3DBench: A Comprehensive Benchmark for 3D Spatial Perception in Remote Sensing [paper] [code]
  14. [2025 Arxiv] DGL-RSIS: Decoupling Global Spatial Context and Local Class Semantics for Training-Free Remote Sensing Image Segmentation [paper] [code]
  15. [2025 TGRS] A Unified SAM-Guided Self-Prompt Learning Framework for Infrared Small Target Detection [paper] [code]
  16. [2025 TGRS] Semantic Prototyping With CLIP for Few-Shot Object Detection in Remote Sensing Images [paper]
  17. [2025 Arxiv] ATRNet-STAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the Wild [paper] [code]
  18. [2025 ISPRS] AdaptVFMs-RSCD: Advancing Remote Sensing Change Detection from binary to semantic with SAM and CLIP [paper] [data]
  19. [2025 Arxiv] PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection [paper] [code]
  20. [2025 Arxiv] Few-Shot Adaptation Benchmark for Remote Sensing Vision-Language Models [paper] [code]
  21. [2025 RSE] Strategic sampling for training a semantic segmentation model in operational mapping: Case studies on cropland parcel extraction [paper] [data] [code]
  22. [2025 CVPR] SkySense-O:TowardsOpen-WorldRemoteSensingInterpretation withVision-CentricVisual-LanguageModeling [paper] [code]
  23. [2025 Arxiv] SAR-KnowLIP: Towards Multimodal Foundation Models for Remote Sensing [paper] [code]
  24. [2025 Arxiv] LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation [paper]
  25. [2025 CVM] Remote sensing tuning: A survey [paper] [code]
  26. [2025 ISPRS] AdaptVFMs-RSCD:Advancing Remote Sensing Change Detection from binary to semantic with SAM and CLIP [paper] [data]
  27. [2025 NatureMI] A semantic-enhanced multi-modal remote sensing foundation model for Earth observation [paper]
  28. [2025 NIPS] Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind [paper] [code]
  29. [2025 TPAMI] RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive Learning [paper]
  30. [2025 Arxiv] FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection [paper] [code]
  31. [2025 TGRS] Multimodal Visual-Language Prompt Network for Remote Sensing Few-Shot Segmentation [paper] [code]

Training Open-Vocabulary Semantic Segmentation

  1. [2022 CVPR] GroupViT: Semantic Segmentation Emerges from Text Supervision [paper] [code]
  2. [2023 CVPR] Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning [paper]
  3. [2023 CVPR] Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs [paper] [code]
  4. [2023 ICCV] Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only [paper]
  5. [2023 ICML] SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation [paper] [code]
  6. [2023 NIPS] Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation [paper] [code]
  7. [2024 CVPR] SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation [paper] [code]
  8. [2024 CVPR] Not All Classes Stand on Same Embeddings: Calibrating a Semantic Distance with Metric Tensor [paper]
  9. [2024 CVPR] USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation [paper]
  10. [2024 CVPR] CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation [paper] [code]
  11. [2024 CVPR] SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding [paper]
  12. [2024 ECCV] CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation [paper] [code]
  13. [2024 ICLR] CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction [paper] [code]
  14. [2024 NIPS] Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels [paper] [code]
  15. [2024 NIPS] Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation [paper][[code]]
  16. [2024 arXiv] DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment [paper]
  17. [2025 CVPR] Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation [paper][code]
  18. [2025 CVPR] Your ViT is Secretly an Image Segmentation Model [paper] [code]
  19. [2025 CVPR] Exploring Simple Open-Vocabulary Semantic Segmentation [paper][code]
  20. [2025 CVPR] Dual Semantic Guidance for Open Vocabulary Semantic Segmentation [paper]
  21. [2025 CVPR] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception [paper] [code]
  22. [2025 ICCV] Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation [paper] [code]
  23. [2025 ICLR] Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion [paper] [code]
  24. [2025 ICCV] Text-guided Visual Prompt DINO for Generic Segmentation [paper][code]
  25. [2024 arXiv] Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision [paper] [code]
  26. [2024 arXiv] High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation [paper] [code]
  27. [2025 CVPR] Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception [paper] [code]
  28. [2025 CVPR] Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation [paper] [code]
  29. [2025 arXiv] SegMASt3R: Geometry Grounded Segment Matching [paper] [code]
  30. [2025 arXiv] Unified Open-World Segmentation with Multi-Modal Prompts [paper] [code]
  31. [2025 NeurIPS] OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation[paper][code]

Zero-Shot Open-Vocabulary Semantic Segmentation

  1. [2023 ICML] Grounding Everything: Emerging Localization Properties in Vision-Language Transformers [paper] [code]
  2. [2024 CVPR] On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning? [paper] [code]
  3. [2024 CVPR] Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation [paper] [code]
  4. [2024 CVPR] Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion [paper] [code]
  5. [2024 ECCV] OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation [paper] [code]
  6. [2024 ICCV] Zero-guidance Segmentation Using Zero Segment Labels [paper] [code]
  7. [2024 NIPS] DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut [paper] [code]
  8. [2025 ICLR] Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model [paper] [code]
  9. [2026 ICLR]COPATCH: ZERO-SHOTREFERRINGIMAGESEGMEN TATIONBYLEVERAGINGUNTAPPEDSPATIALKNOWL EDGEINCLIP[paper]
  10. [2026 ICLR] TIDES:TRAINING-FREEINSTANCEDETECTIONFROM SEMANTICS[paper]

Few-Shot Open-Vocabulary Semantic Segmentation

  1. [2024 NIPS] Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts [paper]
  2. [2024 NIPS] A Surprisingly Simple Approach to Generalized Few-Shot Semantic Segmentation [paper] [code]
  3. [2024 NIPS] Renovating Names in Open-Vocabulary Segmentation Benchmarks [paper] [code]
  4. [2025 CVPR] Hyperbolic Uncertainty-Aware Few-Shot Incremental Point Cloud Segmentation [paper]
  5. [2025 ICCV] Probabilistic Prototype Calibration of Vision-language Models for Generalized Few-shot Semantic Segmentation [paper] [code]
  6. [2025 MICCAI] Realistic Adaptation of Medical Vision-Language Models [paper] [code]
  7. [2025 NeurIPS] SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation[paper]

Supervised Semantic Segmentation

  1. [2021 ICCV] Vision Transformers for Dense Prediction [paper] [code]
  2. [2021 ICCV] Segmenter: Transformer for Semantic Segmentation [paper] [code]
  3. [2022 ICLR] Language-driven Semantic Segmentation [paper] [code]
  4. [2025 CVPV] Your ViT is Secretly an Image Segmentation Model [paper] [code]

Weakly Supervised Semantic Segmentation

  1. [2022 CVPR] Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers [paper] [code]
  2. [2022 CVPR] MCTFormer:Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation [paper] [code]
  3. [2023 CVPR] Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization [paper] [code]
  4. [2023 ICCV] Spatial-Aware Token for Weakly Supervised Object Localization [paper] [code]
  5. [2023 CVPR] Boundary-enhanced Co-training for Weakly Supervised Semantic Segmentatio [paper] [code]
  6. [2023 CVPR] ToCo:Token Contrast for Weakly-Supervised Semantic Segmentation [paper] [code]
  7. [2023 CVPR] CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation [paper] [code]
  8. [2023 arXiv] MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation [paper] [code]
  9. [2024 CVPR] Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation [paper] [code]
  10. [2024 CVPR] CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation [paper] [code]
  11. [2024 CVPR] DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation [paper] [code]
  12. [2024 CVPR] Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation [paper] [code]
  13. [2024 CVPR] Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation [paper]
  14. [2024 CVPR] Official code for Class Tokens Infusion for Weakly Supervised Semantic Segmentation [paper] [code]
  15. [2024 CVPR] Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation [paper] [code]
  16. [2024 CVPR] Class Tokens Infusion for Weakly Supervised Semantic Segmentation [paper] [code]
  17. [2024 CVPR] SFC: Shared Feature Calibration in Weakly Supervised Semantic Segmentation [paper] [code]
  18. [2024 CVPR] PSDPM:Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation [paper] [code]
  19. [2024 ECCV] DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation [paper]
  20. [2024 ECCV] CoSa:Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation [paper] [code]
  21. [2024 AAAI] Progressive Feature Self-Reinforcement for Weakly Supervised Semantic Segmentation [paper] [code]
  22. [2024 arXiv] A Realistic Protocol for Evaluation of Weakly Supervised Object Localization [paper] [code]
  23. [2024 IEEE] SSC:Spatial Structure Constraints for Weakly Supervised Semantic Segmentation [paper] [code]
  24. [2025 CVPR] POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation [paper] [code]
  25. [2025 CVPR] PROMPT-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis [paper] [code]
  26. [2025 CVPR] Exploring CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation [paper] [code]
  27. [2025 CVPR] GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery [paper] [code]
  28. [2025 CVPR] Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation [paper]
  29. [2025 CVPR] Prompt Categories Cluster for Weakly Supervised Semantic Segmentation [paper]
  30. [2025 ICCV] Class Token as Proxy: Optimal Transport-assisted Proxy Learning for Weakly Supervised Semantic Segmentation
  31. [2025 ICCV] Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation
  32. [2025 ICCV] Bias-Resilient Weakly Supervised Semantic Segmentation Using Normalizing Flows
  33. [2025 ICCV] OVA-Fields: Weakly Supervised Open-Vocabulary Affordance Fields for Robot Operational Part Detection
  34. [2025 AAAI] MoRe: Class Patch Attention Needs Regularization for Weakly Supervised Semantic Segmentation [paper] [code]
  35. [2025 arXiv] TeD-Loc: Text Distillation for Weakly Supervised Object Localization [paper] [code]
  36. [2025 arXiv] Image Augmentation Agent for Weakly Supervised Semantic Segmentation [paper]
  37. [2025 CVPR] Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation [paper]

Semi-Supervised Semantic Segmentation

  1. [2025 ICCV] ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction [paper] [code]

Unsupervised Semantic Segmentation

  1. [2021 ICCV] Emerging Properties in Self-Supervised Vision Transformers [paper] [code] [note]
  2. [2022 CVPR] Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization [paper] [code]
  3. [2022 CVPR] Freesolo: Learning to segment objects without annotations [paper] [code]
  4. [2022 ECCV] Extract Free Dense Labels from CLIP [paper] [code] [note]
  5. [2023 CVPR] ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation [paper] [code]
  6. [2024 CVPR] Guided Slot Attention for Unsupervised Video Object Segmentation [paper] [code]
  7. [2024 CVPR] ReCLIP++:Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation [paper] [code]
  8. [2024 CVPR] CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers [paper] [code]
  9. [2024 CVPR] EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation [paper] [code]
  10. [2024 ECCV] Unsupervised Dense Prediction using Differentiable Normalized Cuts [paper]
  11. [2024 NIPS] PaintSeg: Training-free Segmentation via Painting [paper]
  12. [2025 ICCV] DIP: Unsupervised Dense In-Context Post-training of Visual Representations [paper] [code]