LETTERS TO THE EDITOR Misconceptions in the field guide to big data for neurosurgeons

TO THE EDITOR: As an interdisciplinary team in applied artificial intelligence (AI) in healthcare, we share Raju and colleagues’1 view on the need for a clear overview on this topic (Raju B, Jumah F, Ashraf O, et al. Big data, machine learning, and artificial intelligence: a field guide for neurosurgeons. J Neurosurg. Published online October 2, 2020. doi:10.3171/2020.5.JNS201288). Such assessments can profoundly influence how researchers approach experimental design and data interpretation. Therefore, it is important to avoid misconceptions or omissions. Data science does not, in fact, encompass AI and robotics as subspecialties (Fig. 1 in Raju et al.), as these fields have existed independently since the early 1960s and have much wider applications. In contrast to the definition proposed by Raju et al.,1 data mining is the process of deriving value from large structured or unstructured data, which typically utilizes some machine learning methods. Raju and colleagues’1 explanation of unsupervised learning is oversimplified. A simple example would, in fact, be a cluster analysis in which the algorithm categorizes data into groups based on specified similarity and/or dissimilarity measures without prior labels. An artificial neural network (ANN) is actually a well-defined model characterized by a set of mathematical functions that broadly determine the network of neurons, propagation function, and bias.2 Concepts of ANN and deep learning (DL) are misrepresented since an ANN can have multiple hidden layers. Readers should be aware of these limitations in the description of terminologies relevant to understanding the health data science literature. Machine learning frameworks provide a structured approach to many machine learning applications. Those interested in these should refer to contemporary frameworks such as Scikit-Learn,3 TensorFlow,4 and PyTorch.5 An additional consideration is that a major challenge lies in the pre-processing of data into an analytic data set. Structured and unstructured health data hold different information and have a different quality. Equivalent to a robust study design in conventional research, rigor in data capture and processing must be an integral part of big data analytics. Second, cloud-based platforms are useful not only for storing data but also for performing analyses. High-performance computing is usually required for complex machine learning models including DL. Since healthcare systems cannot each have such facilities, the cloud-based platforms offer a scalable solution. Third, testing and evaluating the machine learning system on a distinct data set, preferably in a different institute or setting, can provide external validity. This requires appropriate and comparable measures such as the area under the receiver operating characteristic curve (AUROC), precision, and recall, to name a few. Using another data set also facilitates transfer learning, which enables the improvement of an existing model or application for another task. Lastly, there is a need for Explainable AI, especially for the more complex approaches. This is particularly relevant to health research because it relates to legality and ethics on fairness, accountability, and transparency in machinebased decisions. There is a clear role for applied health data science in neurosurgery. We recommend Artificial Intelligence: A Modern Approach2 for those interested in building a foundation in AI and encourage close interdisciplinary collaboration to succeed in deriving value from big data for our patients.

TO THE EDITOR: As an interdisciplinary team in applied artificial intelligence (AI) in healthcare, we share Raju and colleagues' 1 view on the need for a clear overview on this topic (Raju B, Jumah F, Ashraf O, et al. Big data, machine learning, and artificial intelligence: a field guide for neurosurgeons. J Neurosurg. Published online October 2, 2020. doi:10.3171/2020.5.JNS201288). Such assessments can profoundly influence how researchers approach experimental design and data interpretation. Therefore, it is important to avoid misconceptions or omissions.
Data science does not, in fact, encompass AI and robotics as subspecialties ( Fig. 1 in Raju et al.), as these fields have existed independently since the early 1960s and have much wider applications. In contrast to the definition proposed by Raju et al., 1 data mining is the process of deriving value from large structured or unstructured data, which typically utilizes some machine learning methods. Raju and colleagues' 1 explanation of unsupervised learning is oversimplified. A simple example would, in fact, be a cluster analysis in which the algorithm categorizes data into groups based on specified similarity and/or dissimilarity measures without prior labels. An artificial neural network (ANN) is actually a well-defined model characterized by a set of mathematical functions that broadly determine the network of neurons, propagation function, and bias. 2 Concepts of ANN and deep learning (DL) are misrepresented since an ANN can have multiple hidden layers. Readers should be aware of these limitations in the description of terminologies relevant to understanding the health data science literature. Machine learning frameworks provide a structured approach to many machine learning applications. Those interested in these should refer to contemporary frameworks such as Scikit-Learn, 3 TensorFlow, 4 and PyTorch. 5 An additional consideration is that a major challenge lies in the pre-processing of data into an analytic data set. Structured and unstructured health data hold different information and have a different quality. Equivalent to a robust study design in conventional research, rigor in data capture and processing must be an integral part of big data analytics. Second, cloud-based platforms are useful not only for storing data but also for performing analyses. High-performance computing is usually required for complex machine learning models including DL. Since healthcare systems cannot each have such facilities, the cloud-based platforms offer a scalable solution. Third, testing and evaluating the machine learning system on a distinct data set, preferably in a different institute or setting, can provide external validity. This requires appropriate and comparable measures such as the area under the receiver operating characteristic curve (AUROC), precision, and recall, to name a few. Using another data set also facilitates transfer learning, which enables the improvement of an existing model or application for another task. Lastly, there is a need for Explainable AI, especially for the more complex approaches. This is particularly relevant to health research because it relates to legality and ethics on fairness, accountability, and transparency in machinebased decisions.
There is a clear role for applied health data science in neurosurgery. We recommend Artificial Intelligence: A Modern Approach 2 for those interested in building a foundation in AI and encourage close interdisciplinary collaboration to succeed in deriving value from big data for our patients.

Response
We thank Dr. Poon and his team for their thoughtful and knowledgeable comments. We agree that our attempt to oversimply the concepts of big data, machine learning, and AI can be confusing. As data science and AI are intricately related and dependent, many sources categorize AI as part of data science. However, we agree with Poon and colleagues' comments that they exist independently and are to be considered a separate field. We are also aware that the ANN includes multiple layers. To foster under-standing of the concept by neurosurgeons, we oversimplified the diagram, defining only a single layer rather than the multiple layers of DL. The additional challenges related to data integration, high-performance computing, and external validity are mentioned superficially within our paper. Though we agree and accept these additional details and corrections proposed by Poon et al., we believe that our paper was intended "for neurosurgeons, by neurosurgeons" and will aid in understanding the concepts and applications of big data.

Rutgers-Robert Wood Johnson Medical School and
University Hospital, New Brunswick, NJ