Huang Qidong

shikiw

https://shikiw.github.io/

AI & ML interests

multi-modal LLMs

Recent Activity

upvoted a collection 13 days ago

Qwen3-VL

authored a paper about 1 month ago

Qwen3-VL Technical Report

upvoted a paper about 1 month ago

Qwen3-VL Technical Report

View all activity

Organizations

None yet

upvoted a collection 13 days ago

Qwen3-VL

Collection

37 items • Updated 7 days ago • 558

authored a paper about 1 month ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 148

upvoted a paper about 1 month ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 148

authored 5 papers about 1 month ago

Diversity-Aware Meta Visual Prompting

Paper • 2303.08138 • Published Mar 14, 2023

Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

Paper • 2308.10315 • Published Aug 20, 2023

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Paper • 2502.08590 • Published Feb 12, 2025 • 42

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Paper • 2502.11903 • Published Feb 17, 2025

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

Paper • 2509.22647 • Published Sep 26, 2025 • 32

liked a model about 1 month ago

Qwen/Qwen3-VL-8B-Instruct

Image-Text-to-Text • 9B • Updated Oct 15, 2025 • 2.45M • • 628

liked 2 models 2 months ago

Qwen/Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Oct 15, 2025 • 607k • 292

Qwen/Qwen3-VL-235B-A22B-Thinking-FP8

Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 5.02k • 24

liked 2 models 3 months ago

Qwen/Qwen3-VL-235B-A22B-Instruct

Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 221k • • 349

Qwen/Qwen3-VL-235B-A22B-Thinking

Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 40.3k • • 357

authored a paper 7 months ago

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

Paper • 2506.19848 • Published Jun 24, 2025 • 26

liked a dataset 7 months ago

long-xing1/ScaleCap-450k

Viewer • Updated Jun 25, 2025 • 455k • 125 • 5

upvoted a paper 7 months ago

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

Paper • 2506.19848 • Published Jun 24, 2025 • 26

upvoted 2 papers 9 months ago

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Paper • 2504.07956 • Published Apr 10, 2025 • 46

MM-IFEngine: Towards Multimodal Instruction Following

Paper • 2504.07957 • Published Apr 10, 2025 • 35

upvoted a paper 10 months ago

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3, 2025 • 85

upvoted a paper 11 months ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25, 2025 • 74

Huang Qidong

AI & ML interests

Recent Activity

Organizations

shikiw's activity