THE BI-MONTHLY JOURNAL OF THE BWW SOCIETY

Abstract. This paper focuses on a genetic algorithm approach helping the process of cancer diagnosis and treatment. In both the diagnosis and treatment processes, the physician usually compares numerical medical data against specific internal parameters (thresholds) in order to determine an optimal decision. The goal of this paper is to explore a GA-based approach in order to help the physician to find the optimum threshold values related both to diagnose and treatment of the cancer.

1. Introduction

The occurrence of particular types of cancer varies remarkably according to a wide range of factors, including age, sex, calendar time, geography, etc. The oncologist studies how the disease depends on a constellation of risk factors acting on the population and uses this information to determine the best measures for prevention and treatment [1]. The diagnosis of different types of cancer is difficult, especially in the early stages, most of the patients being diagnosed in advanced stages. For instance, a complex analysis process involving the alpha-fetoprotein (AFP), the use of imaging modalities (e.g. power Doppler, harmonic imaging, pulse inversion, etc.), combined with micro bubble contrast agents and a better understanding of the importance of the main serum enzymes values (e.g. ALT, AST, BRT, GGT, etc.), has significantly improved the rate of detection for early (small) hepatocellular carcinoma (HCC) [2]. Irrespective of the detection factor, frequently expressed as numerical data, the most important step consists in evaluating accurately the specific internal threshold values, corresponding to a specific disease.

When the diagnosis problem has been successfully solved, attention is given to design a treatment procedure. The design of the optimal treatment formula strongly depends on the patient specific features and, consequently, requires methods of associating some quantitative and qualitative patient medical data to a certain treatment procedure. Irrespective of the particular patient characteristic influencing the therapy type, for each of these parameters, there are threshold values implying a decision concerning the appropriate therapy [3].

We aim to introduce here a genetic algorithm-based approach in order to obtain optimal (or near optimal) threshold values helping the cancer diagnose and treatment process.

2. Materials and methods

Genetic algorithms

Genetic algorithms (GAs) were developed by Holland (1975) and extended by Goldberg (1989), to solve difficult optimization problems by intelligent exploitation of a random search. They are stochastic algorithms with the natural evolution metaphor behind their building philosophy. Since the classical genetic algorithms, operating on binary strings, require the modification of the original problem, we will use here an evolution program [6], [7], which leaves the problem unchanged, modifying the chromosomes representation and applying appropriate genetic operators.

We give here only a brief remainder necessary to describe the genetic algorithms context. Generally, a genetic algorithm may be considered to be composed of three essential components:

§ A set of potential solutions called individuals or chromosomes that will evolve during a number of iterations (generations). This set of solutions is also called population;

§ An evaluation mechanism (fitness function) that allows assessing the quality or fitness of each individual of the population;

§ An evolution procedure that is based on some "genetic" operators such as selection, crossover and mutation.

Concretely, a genetic algorithm (as well as any evolution program) solving a particular problem consists in [8]:

§ A genetic representation for potential solutions to the problem;

§ A way to create an initial population of potential solutions;

§ An evaluation function that plays the role of the environment rating solutions in terms of their "fitness";

§ Genetic operators that alter the composition of "parents", thus producing "children";

§ Values for various parameters that the genetic algorithm uses (population size, probabilities of applying genetic operators, etc.).

In our GAs approach, let us firstly consider a number of S individuals either in healthy state or suffering from a certain type of cancer. In order to help a good decision making, we present here a genetic algorithm approach which allows the classification of a certain individual into k classes related, on the one hand, to (k – 1) types of cancer and, on the other hand, to a class corresponding to the healthy state case. This algorithm is quite simple: it compares, the same way the physician does, the values of n parameters (V_ij), i = 1, 2,…, n,
j = 1, 2,…, S, corresponding to n risk factors, of a certain individual, against some internal parameters (thresholds) and, thus, the individual is classified to one and only one of the k different classes. Clearly, the quality of the threshold parameters plays a determinant role for a good classification.

As concerns the therapy procedure, let us consider, as above, a number of S individuals suffering from a certain type of cancer. Generally, for any type of disease, a certain number k of treatment procedures might be considered. In order to help a good decision-making concerning the appropriate therapy taking into account the specific characteristics of each individual (that is, specific data with the corresponding thresholds), we consider the same genetic algorithm approach as above, allowing the classification of the treatment formulas into one of the k classes, depending on the specific patient features.

Irrespective of the situation (i.e. diagnose or treatment), it is easy to see that, for k different classes, the number of thresholds is (k - 1) for each of the n parameters (V_ij) and, consequently, there are, totally, a number of n(k - 1) thresholds, denoted by X_i and seen as chromosomes in our GAs approach. Next, selection is carried out by the Monte Carlo procedure, the classical one-point crossover is used to generate new chromosomes and for the mutation the simple translation (one step, randomly) technique is used.

To evaluate the fitness of a chromosome, we run a simple classification algorithm, given by:

The classification algorithm

IF j = 1,…, S, i = 1,…, n, V_ij X_i THEN Class = C₁

ELSE IF j = 1,…, S, i = 1,…, n, V_ij X_n+i THEN Class = C₂

……………………………………………………………….

ELSE IF j = 1,…, S, i = 1,…, n, V_ij X_n_(k-2)+i THEN Class = C_k-₁ ELSE Class = C_k.

The cost function is given by the sum of individuals that are classified in the right way. These classified individuals are those for which the class determined by the classification algorithm is the same as the known class given by the physician. The aim is, obviously, to maximize this function. The stop condition is reached when the number of the current generation becomes the number of generations that is set in the beginning of the algorithm.

Java implementation

What is important about the Java implementation of the program is that all data about patients collected by physicians can, at any time, be added, modified or deleted, with no change in the source of the program whatsoever. That is so because for the processing of the data we have used JDBC (Java Database Connectivity). Let us also note that physicians can also modify the structure of that table, adding new parameters that may prove to be important to the diagnostic, and still the program remains functional.

3. Results and discussion

In order to check up the efficiency of this approach we have tested it both in the diagnose process and the treatment evaluation on a small learning data set.

Firstly, we have considered a number S = 15 subjects (7 in healthy state and 8 with HCC). For the diagnose process we have considered 5 parameters consisting in 4 serum enzymes (ALT -alanine aminotransferase, AST -aspartate aminotransferase, BRT -total bilirubin and GGT -gamma glutamyl transpeptidase) plus the subject age. As concerns the classification classes we have considered two classes: C₁ = {individuals without hepatic cancer} and C₂ = {individuals with hepatic cancer}. In this case we have obtained on average 72% individuals with the right diagnosis.

Secondly, we have tested our GAs model on a number of S = 17 females with breast cancer. In this case, we have considered the following two standard treatment procedures, seen in an increasing complexity order:

§ chemotherapy (CT);

§ chemotherapy (CT) + hormone therapy (HT);

We obtained on average 67% accuracy for the treatment formula design. Let us mention that a population of Y = 30 chromosomes was used in our study.

4. Conclusion

This experiment seems to be satisfactory enough from a practical point of view. It shows that the algorithm is able to find the (near) optimal solution with a good enough accuracy. Since the number of subjects was small, the thresholds obtained using the GAs approach are strongly related to this particular database. Obviously, with much more subjects in the study and a larger number of medical characteristics the classification problem will become more complicate. On the other hand, we have to consider alternative crossovers, mutations and corresponding probabilities in order to improve the classification accuracy. Clearly, much work still needs to be done before this method could be brought into practice.

References

[1] M. Abellof, J. Armitage, A. Lichter, J. Niedehuber (Eds.), Clinical Oncology, 2nd Edition. Churchill Livingstone, 2000.

[2] A. Saftoiu, T. Ciurea, F. Gorunescu, Hepatic arterial blood flow in large hepatocellular carcinoma with or without portal vein thrombosis: assessment by transcutaneous Duplex Doppler sonography. European Journal of Gastroenterology & Hepatology, Vol. 14(2), p. 167-176, 2002.

[3] F. Gorunescu, M. Gorunescu, F. Badulescu, A. Badulescu, Data mining techniques in uterus cancer. In: Proceedings of the 2nd Romanian Congress of Surgical Oncology, October 2002, Cluj, Romania (abstract), Romanian Journal of Surgical Oncology, 3, 1, p. 29, 2002.

[4] J.H. Holland, Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor, 1975.

[5] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA, 1989.

[6] D.B. Fogel, Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway, NJ, 1995.

[7] D. Dumitrescu, Genetic algorithms and evolution strategies -Applications in Artificial Intelligence and connex domains. Microinformatica, Cluj-Napoca, 2000.

[8] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Second, Extended Edition. Springer-Verlag, 1994.

BWW Society member was born on February 5th, 1953 in Craiova, Romania. In 1976 graduated from the Faculty of Mathematics and Informatics of the University of Craiova, and in 1979 he received his Ph. D. in Mathematics from the University of Bucharest.

Dr. Professor of Mathematics, Statistics and Informatics at the University of Medicine and Pharmacy of Craiova. Professor Gorunescu presently serves as Deputy Dean of the Faculty of Pharmacy, Department of Mathematics, Biostatistics and Informatics at Romania's University of Medicine and Pharmacy of Craiova.

Dr. Gorunescu has published six books and more than 90 scientific papers, and serves as a reviewer for Zentralblatt fur Mathematik. He is a Member of the French Society of Statistics, and has received academic scholarships from both l' Universite Libre de Bruxelles, Belgium and the University of Ulster, in the United Kingdom.

[ back to "Publications & Special Reports" ]
[ BWW Society Home Page ]