B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

Arya, Shreyash; Rao, Sukrut; Boehle, Moritz; Schiele, Bernt

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

会議論文

B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

MPS-Authors

/persons/resource/persons301781

Arya, Shreyash
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons253144

Rao, Sukrut
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons230363

Boehle, Moritz
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons45383

Schiele, Bernt
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

External Resource

https://proceedings.neurips.cc/paper_files/paper/2024/file/72d50a87b218d84c175d16f4557f7e12-Paper-Conference.pdf
(出版社版)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

arXiv:2411.00715.pdf
(プレプリント), 26MB

付随資料 (公開)

There is no public supplementary material available

引用

Arya, S., Rao, S., Boehle, M., & Schiele, B. (2024). B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable. In A., Globerson, L., Mackey, D., Belgrave, A., Fan, U., Paquet, J., Tomczak, & C., Zhang (Eds.), Advances in Neural Information Processing Systems 37 (pp. 62756-62786). Curran Associates, Inc.

引用: https://hdl.handle.net/21.11116/0000-0010-0FBE-9

要旨

B-cos Networks have been shown to be effective for obtaining highly human
interpretable explanations of model decisions by architecturally enforcing
stronger alignment between inputs and weight. B-cos variants of convolutional
networks (CNNs) and vision transformers (ViTs), which primarily replace linear
layers with B-cos transformations, perform competitively to their respective
standard variants while also yielding explanations that are faithful by design.
However, it has so far been necessary to train these models from scratch, which
is increasingly infeasible in the era of large, pre-trained foundation models.
In this work, inspired by the architectural similarities in standard DNNs and
B-cos networks, we propose 'B-cosification', a novel approach to transform
existing pre-trained models to become inherently interpretable. We perform a
thorough study of design choices to perform this conversion, both for
convolutional neural networks and vision transformers. We find that
B-cosification can yield models that are on par with B-cos models trained from
scratch in terms of interpretability, while often outperforming them in terms
of classification performance at a fraction of the training cost. Subsequently,
we apply B-cosification to a pretrained CLIP model, and show that, even with
limited data and compute cost, we obtain a B-cosified version that is highly
interpretable and competitive on zero shot performance across a variety of
datasets. We release our code and pre-trained model weights at
https://github.com/shrebox/B-cosification.