Detecting Web-Based Attacks with SHAP and Tree Ensemble Machine Learning Methods

Ndichu, Samuel; Kim, Sangwook; Ozawa, Seiichi; Ban, Tao; Takahashi, Takeshi; Inoue, Daisuke

https://hdl.handle.net/20.500.14094/90009126

このアイテムのアクセス数:42件（2024-05-25 16:57 集計）

閲覧可能ファイル

ファイル	フォーマット	サイズ	閲覧回数	説明
90009126 (fulltext)	pdf	1.01 MB	9

メタデータ

ファイル出力

メタデータID	90009126
アクセス権	open access
出版タイプ	Version of Record
タイトル	Detecting Web-Based Attacks with SHAP and Tree Ensemble Machine Learning Methods
著者	Ndichu, Samuel ; Kim, Sangwook ; Ozawa, Seiichi ; Ban, Tao ; Takahashi, Takeshi ; Inoue, Daisuke
著者名 Ndichu, Samuel
著者ID A2320 研究者ID 1000000826878 KUID https://kuid-rm-web.ofc.kobe-u.ac.jp/search/detail?systemId=4914df63fe5dcca6520e17560c007669 著者名 Kim, Sangwook キム, サンウック所属機関名工学研究科
著者ID A1729 研究者ID 1000070214129 KUID https://kuid-rm-web.ofc.kobe-u.ac.jp/search/detail?systemId=5d6ba4d6ae71eb49520e17560c007669 著者名 Ozawa, Seiichi 小澤, 誠一オザワ, セイイチ所属機関名数理・データサイエンスセンター
著者名 Ban, Tao
著者名 Takahashi, Takeshi
著者名 Inoue, Daisuke
収録物名	Applied Sciences
巻(号)	12(1)
ページ	60
出版者	MDPI
刊行日	2022-01
公開日	2022-04-08
抄録	Attacks using Uniform Resource Locators (URLs) and their JavaScript (JS) code content to perpetrate malicious activities on the Internet are rampant and continuously evolving. Methods such as blocklisting, client honeypots, domain reputation inspection, and heuristic and signature-based systems are used to detect these malicious activities. Recently, machine learning approaches have been proposed; however, challenges still exist. First, blocklist systems are easily evaded by new URLs and JS code content, obfuscation, fast-flux, cloaking, and URL shortening. Second, heuristic and signature-based systems do not generalize well to zero-day attacks. Third, the Domain Name System allows cybercriminals to easily migrate their malicious servers to hide their Internet protocol addresses behind domain names. Finally, crafting fully representative features is challenging, even for domain experts. This study proposes a feature selection and classification approach for malicious JS code content using Shapley additive explanations and tree ensemble methods. The JS code features are obtained from the Abstract Syntax Tree form of the JS code, sample JS attack codes, and association rule mining. The malicious and benign JS code datasets obtained from Hynek Petrak and the Majestic Million Service were used for performance evaluation. We compared the performance of the proposed method to those of other feature selection methods in the task of malicious JS code content detection. With a recall of 0.9989, our experimental results show that the proposed approach is a better prediction model.
キーワード	web-based attacks
	feature selection
	Shapley additive explanations
	tree ensemble methods
	machine learning
カテゴリ	工学研究科
	数理・データサイエンスセンター
	学術雑誌論文
権利	© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
権利	This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

資源タイプ	journal article
言語	English (英語)
eISSN	2076-3417　OPACで所蔵を検索　 CiNiiで学外所蔵を検索
関連情報	DOI https://doi.org/10.3390/app12010060

閲覧可能ファイル

メタデータ

詳細を表示