MyPaper
3DGS Segmentation
- [2025 ICCV] CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting [paper]
- [2026 CVPR] C3G: Learning Compact 3D Representations with 2K Gaussians [paper] [code]
- [2026 arXiv] GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens[paper]
- [2026 CVPR] EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding [paper] [code]
- [2026 ICLR] Learning Unified Representation of 3D Gaussian Splatting [paper] [code]
- [2024 NeurIPS] Large Spatial Model: End-to-end Unposed Images to Semantic 3D [paper] [code]
- [2025 NeurIPS] SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment [paper] [code]
- [2025 ACM MM] SLGaussian: Fast Language Gaussian Splatting in Sparse Views [paper] [code]
- [2025 CVPR] Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration [paper] [code]
- [2025 ICCV] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion [paper] [code]
- [2025 ICCV] SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining [paper] [code]
相关领域
- [2025 CVPR] From Pixels to Words – Towards Native Vision-Language Primitives at Scale [paper] [code]
- [2025 CVPR] Exploring Cross-Modal Flows for Few-Shot Learning [paper]
- [2025 CVPR] AnyUp: Universal Feature Upsampling [paper]
- [2025 CVPR] CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection [paper]
- [2025 CVPR] Data or Language Supervision: What Makes CLIP Better than DINO? [paper]
- [2025 CVPR] VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross Modal Mutual Information Maximization [paper] [code]
- [2023 CVPR] Black Box Few-Shot Adaptation for Vision-Language models[paper][code]
- [2025 ICLR] Towards Calibrated Deep Clustering Network[paper][code]
- [2025 NeurIPS] Test-Time Adaptive Object Detection with Foundation Model [paper] [code]
- [2025 CVPR] OVRD: OPEN-VOCABULARY RELATION DINO WITH TEXT-GUIDED SALIENT QUERY SELECTION [paper] [code]
- [2025 CVPR] Towards Vision-Language Correspondence without Parallel Data [paper] [code]
- [2024 ECCV] Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection [paper] [code]
- [2025 NeurIPS] Fuse2Match: Training-Free Fusion of Flow,Diffusion, and Contrastive Models for Zero-Shot Semantic Matching[paper][code]
- [2025 NeurIPS] SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning[paper]
- [2025 NeurIPS] OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts[paper]
- [2025 CVPR] Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling[paper][code]
多标签图像分类
- [2022 IJCV] Learning to Prompt for Vision-Language Models[paper][code]
- [2023 ICCV] PatchCT: Aligning Patch Set and Label Set with Conditional Transport for Multi-Label Image Classification[paper] [code]
- [2023 ICCV] Cdul: Clip-driven unsupervised learning for multi-label image classification[paper][code]
- [2024 ICML] Language-driven Cross-modal Classifier for Zero-shot Multi-label Image Recognition[paper][code]
- [2024 AAAI] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training[paper][code]
- [2025 CVPR] SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models[paper][code]
- [2025 CVPR] Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification[paper][code]
- [2025 CVPR] Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport[paper][code]
- [2025 CVPR] Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning[paper][code]
- [2025 ICCV] MambaML: Exploring State Space Models for Multi-Label Image Classification[paper]
- [2025 ICCV] Category-Specific Selective Feature Enhancement for Long-Tailed Multi-Label Image Classification[paper][code]
- [2025 ICCV] More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning[paper][code]
- [2025 ICCV] Language-Driven Multi-Label Zero-Shot Learning with Semantic Granularity [paper][code]
Training-Free Open-Vocabulary Semantic Segmentation
- [2024 CVPR] Clip-diy: Clip dense inference yields open-vocabulary semantic segmentation for-free [paper][code]
- [2024 CVPR] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation [paper] [code]
- [2024 ECCV] Diffusion Models for Open-Vocabulary Segmentation [paper] [code]
- [2024 ECCV] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference [paper] [code]
- [2024 ECCV] SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference [paper] [code]
- [2024 ECCV] Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation [paper] [code]
- [2024 ECCV] Proxyclip: Proxy attention improves clip for open-vocabulary segmentation [paper] [code]
- [2024 ECCV] Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation [paper] [code]
- [2024 CVPR] CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation [paper] [code]
- [2024 CVPR] Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models [paper] [code]
- [2024 CVPR] Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation [paper]
- [2024 ECCV] In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation [paper] [code]
- [2025 CVPR] LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation [paper] [code]
- [2025 CVPR] ResCLIP: Residual Attention for Training-free Dense Vision-language Inference [paper] [code]
- [2025 CVPR] Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation [paper] [code]
- [2025 CVPR] Cheb-GR: Rethinking k-nearest neighbor search in Re-ranking for Person Re-identification [paper] [code]
- [2025 CVPR] ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements [paper] [code]
- [2025 CVPR] Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval [paper] [code]
- [2025 ICCV] Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation [paper] [code]
- [2025 ICCV] E-SAM: Training-Free Segment Every Entity Model [paper]
- [2025 ICCV] ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation [paper] [code]
- [2025 ICCV] CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation [paper] [code]
- [2025 ICCV] CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting [paper] [code]
- [2025 ICCV] Auto-Vocabulary Semantic Segmentation [paper]
- [2025 ICCV] Training-Free Class Purification for Open-Vocabulary Semantic Segmentation [paper] [code]
- [2025 ICCV] DIH-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation
- [2025 ICCV] Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation[paper][code]
- [2025 ICCV] Plug-in Feedback Self-adaptive Attention in CLIP for Training-free Open-Vocabulary Segmentation [paper][code]
- [2025 ICCV] Test-Time Retrieval-Augmented Adaptation for Vision-Language Models [paper] [code]
- [2025 ICCV] Images as Noisy Labels: Unleashing the Potential of the Diffusion Model for Open-Vocabulary Semantic Segmentation
- [2025 AAAI] Training-free Open-Vocabulary Semantic Segmentation via Diverse Prototype Construction and Sub-region Matching [paper]
- [2025 CVPR] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation [paper] [code]
- [2025 CVPR] Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation [paper] [code]
- [2025 CVPR] FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation [paper] [code]
- [2025 CVPR] TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models [paper] [code]
- [2025 CVPR] A Survey on Training-free Open-Vocabulary Semantic Segmentation [paper]
- [2025 CVPR] No time to train! Training-Free Reference-Based Instance Segmentation [paper] [code]
- [2024 CVPR] There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks [paper] [code]
- [2025 CVPR] Rethinking the Global Knowledge of CLIP in Training-Free Open-Vocabulary Semantic Segmentation [paper]
- [2025 CVPR] Seg2Track-SAM2: SAM2-based Multi-object Tracking and Segmentation for Zero-shot Generalization [paper] [code]
- [2024 CVPR] FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [paper] [code]
- [2024 CVPR] TAG: Guidance-free Open-Vocabulary Semantic Segmentation [paper] [code]
- [2025 CVPR] What Holds Back Open-Vocabulary Segmentation? [paper]
- [2025 NeurIPS] Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers [paper] [code]
- [2025 CVPR] SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP [paper] [code]
- [2025 CVPR] Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP [paper] [code]
- [2026 ICLR] IMPROVING VISUAL DISCRIMINABILITY OF CLIP FOR TRAINING-FREE OPEN-VOCABULARY SEMANTIC SEGMENTATION [paper]
- [2026 ICLR] BEYOND OPEN-WORLD: COSRA, A TRAINING-FREE SELF-REFINING APPROACH TO OPEN-ENDED OBJECT DETECTION [paper]
- [2025 NeurIPS] OPMapper: Enhancing Open-Vocabulary Semantic Segmentation with Multi-Guidance Information[paper]
- [2025 CVPR] Effective SAM Combination for Open-Vocabulary Semantic Segmentation[paper]
- [2026 AAAI] Target Refocusing via Attention Redistribution for Open-Vocabulary Semantic Segmentation: An Explainability Perspective[paper][code]