Giseop Kim · Jan 30, 2025 · 11:25 AM UTC

Giseop Kim

Pinned Tweet

Giseop Kim

@GiseopK

Jan 30

I've had a good opportunity to work with great friends, see you all at ICRA25 :) [ICRA 2025] Ephemerality meets LiDAR-based Lifelong Mapping github.com/dongjae0107/ELite Ephemerality meets LiDAR-based Lifelong Mapping piped.video/xZwzNgcHqjc?si=_X2t… @YouTube

Ephemerality meets LiDAR-based Lifelong Mapping

[ICRA 2025] Ephemerality meets LiDAR-based Lifelong Mappinghttps://github.com/dongjae0107/ELite(Code will be open-sourced after the review process)[Abstract]...

youtube.com

Andrew Davison · Oct 26, 2025 · 5:38 PM UTC

Giseop Kim retweeted

Andrew Davison @AjdDavison

Oct 26

Representation representation representation #SpatialAI See the SLAM Handbook Chapter 18 for my views! github.com/SLAM-Handbook-con…

GitHub - SLAM-Handbook-contributors/slam-handbook-public-release: Release repo for our SLAM Handbook

Release repo for our SLAM Handbook. Contribute to SLAM-Handbook-contributors/slam-handbook-public-release development by creating an account on GitHub.

github.com

Matthias Niessner

@MattNiessner

Oct 26

The hot topic at #ICCV2025 was World Models. They come in different flavors — (interactive) video models, neural simulators, reconstruction models, etc. — but the overarching goal is clear: Generative AI that predict and simulate how the real world works.

160

Matthias Niessner · Oct 26, 2025 · 3:58 PM UTC

Giseop Kim retweeted

Matthias Niessner

@MattNiessner

Oct 26

448

Giseop Kim · Oct 24, 2025 · 1:58 AM UTC

Giseop Kim

@GiseopK

Oct 24

I'm attending IROS 2025 😁 The slide from Prof Marco Hutter

Andrej Karpathy · Oct 20, 2025 · 10:13 PM UTC

Giseop Kim retweeted

Andrej Karpathy

@karpathy

Oct 20

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...

vLLM

@vllm_project

Oct 20

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/DeepS… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

568

1,612

380

13,416

Andrew Davison · Oct 16, 2025 · 12:43 PM UTC

Giseop Kim retweeted

Andrew Davison @AjdDavison

Oct 16

A reminder that accurate motion estimation sparse visual SLAM has been in the domain of industry for many years now, and what you might often see in academic papers as the "state of the art" is fairly meaningless. (From @pesarlin.bsky.social)

Chris Offner @chrisoffner3d

Oct 15

Replying to @chrisoffner3d

Industry SLAM systems are far ahead of academic open source systems.

172

Giseop Kim · Sep 23, 2025 · 9:15 AM UTC

Giseop Kim

@GiseopK

Sep 23

piped.video/post/UgkxTrV1m-Q… 출처 @YouTube

Post from DGIST 로봇및기계전자공학과

▶️로기전에 무엇이든 물어보세요!◀️ 평소 대학원 생활을 하면서 지도 교수님이나 학과 교수님께 궁금한 점이 있었나요❓ 오랜 궁금증을 해결할 기회가 생겼습니다! 바로 👉 로기전에 무엇이든 물어보세요! 사소한 궁금증부터 심도있는 질문까지 로기전 교수님을 대상으로 하는 질문 무엇이든 게...

youtube.com

Giseop Kim · Sep 22, 2025 · 12:02 PM UTC

Giseop Kim

@GiseopK

Sep 22

로봇 인재란 무엇인가 irobotnews.com/news/articleV…

로봇 인재란 무엇인가 - 로봇신문

A. E. 밴 보트의 1950년 발표 SF소설 스페이스 비글(The Voyage of the Space Beagle)에는 넥시아리즘(Nexialism), 즉 여러

irobotnews.com

Dmytro Mishkin 🇺🇦 · Sep 18, 2025 · 6:24 AM UTC

Giseop Kim retweeted

Dmytro Mishkin 🇺🇦 @ducha_aiki

Sep 18

Towards the Next Generation of 3D Reconstruction @Parskatt PhD Thesis. tl;dr: would be useful in teaching image matching - nice explanations. (too) Fancy and stylish notation. Cool Ack section and cover image. liu.diva-portal.org/smash/re…

127

Ayoung · Sep 5, 2025 · 5:02 AM UTC

Giseop Kim retweeted

Ayoung @_ayoungk

Sep 5

We also release some LaTeX sty and bib files used in the handbook. If you are writing an ICRA paper on SLAM, these should be useful. Visit our GitHub repo for details: github.com/SLAM-Handbook-con…

slam-handbook-public-release/LaTeX/README.md at main · SLAM-Handbook-contributors/slam-handbook-p...

Release repo for our SLAM Handbook. Contribute to SLAM-Handbook-contributors/slam-handbook-public-release development by creating an account on GitHub.

github.com

Luca Carlone @lucacarlone1

Sep 4

We have completed the SLAM Handbook "From Localization and Mapping to Spatial Intelligence" and released it online: asrl.utias.utoronto.ca/~tdb/… . The handbook will be published by Cambridge University Press. [1/n]

Luca Carlone · Sep 4, 2025 · 6:20 PM UTC

Giseop Kim retweeted

Luca Carlone @lucacarlone1

Sep 4

292

koide3 · Sep 5, 2025 · 10:37 AM UTC

Giseop Kim retweeted

koide3 @k_koide3

Sep 5

RSJ2025で発表したランチョンセミナー資料（学生＆若手向け）です。自分の経験を踏まえて、国際論文を投稿するモチベやどうやって研究・執筆を進めて採択まで持っていくのかまとめました。みんなICRA&IROSに論文投稿しましょう！ speakerdeck.com/koide3/rsj20…

国際論文を出そう！ICRA / IROS / RA-L への論文投稿の心構えとノウハウ / RSJ2025 Luncheon Seminar

2025年9月5日日本ロボット学会学術講演会学術ランチョンセミナー産業技術総合研究所インテリジェントシステム研究部門スマートモビリティ研究グループ小出健司

speakerdeck.com

111

499

Chris Offner · Aug 27, 2025 · 10:34 PM UTC

Giseop Kim retweeted

Chris Offner @chrisoffner3d

Aug 27

Better result (by GPT-5) for the prompt "Show me this room from the perspective of the webcam on the monitor that we see standing on the desk."

kotohibi · Aug 31, 2025 · 4:23 AM UTC

Giseop Kim retweeted

kotohibi

@naribubu

Aug 31

COLMAP camera rig+DJI OSMO360の港北JCTの大規模SfM例。ループクロージングしてます。　動画時間：1時間　Equirectangular：2,206枚　12方向切り出し：26,472枚　SfM処理時間：36時間(一部GPU処理) 時間はかかるが、この規模のSfMが成功するのは驚き☺ #港北 #colmap #osmo360

134

Chris Offner · Aug 14, 2025 · 5:54 PM UTC

Giseop Kim retweeted

Chris Offner @chrisoffner3d

Aug 14

SigLIP (VLMs) and DINO are two competing paradigms for image encoders. My intuition is that joint vision-language modeling works great for semantic problems but may be too coarse for geometry problems like SfM or SLAM. Most animals navigate 3D space perfectly without language.

Chris Offner @chrisoffner3d

Aug 14

Quite the chonker! 😮

580

Federico Baldassarre · Aug 14, 2025 · 4:19 PM UTC

Giseop Kim retweeted

Federico Baldassarre @BaldassarreFe

Aug 14

Say hello to DINOv3 🦖🦖🦖 A major release that raises the bar of self-supervised vision foundation models. With stunning high-resolution dense features, it’s a game-changer for vision tasks! We scaled model size and training data, but here's what makes it special 👇

265

1,894

Ayoung · Aug 8, 2025 · 11:45 AM UTC

Giseop Kim retweeted

Ayoung @_ayoungk

Aug 8

Did ICRA26 go back to 8p format, discarding 6+n? 2026.ieee-icra.org/contribut… @ieee_ras_icra

Giseop Kim · Aug 3, 2025 · 12:45 PM UTC

Giseop Kim

@GiseopK

Aug 3

arxiv.org/pdf/2505.18364 ImLPR is the first method to apply a vision foundation model like DINOv2 to LiDAR Place Recognition by converting point clouds into Range Image Views. It outperforms SOTA methods and shows that RIV is more effective than BEV for adapting LiDAR to vision models.

Jianyuan · Jul 29, 2025 · 8:05 PM UTC

Giseop Kim retweeted

Jianyuan

@jianyuan_wang

Jul 29

VGGT has been re-licensed to allow commercial usage. Enjoy the gift 😉😇 👉 huggingface.co/facebook/VGGT…

555

Ayoung · Jul 28, 2025 · 12:07 AM UTC

Giseop Kim retweeted

Ayoung @_ayoungk

Jul 28

Our new work TRAN-D is accepted to ICCV 2025! TRAN-D reconstructs transparent object geometry in more dynamic scenes. jeongyun0609.github.io/TRAN-… 📉39% lower MAE than baselines, with fewer views. ⚡️Scene updates in seconds with physics sim, no rescan needed! More in the thread 👇

Ayoung · Jun 20, 2025 · 3:08 PM UTC

Giseop Kim retweeted

Ayoung @_ayoungk

Jun 20

New IF and ranking. RA-L Q2->Q1 and IJRR Q1->Q2. jcr.clarivate.com/jcr/