Diabetes Mellitus Forecast

Smith et al. [1] used an early neural network model to forecast the onset of diabetes mellitus.  The data were downloaded from ftp://ftp.ncc.up.pt/pub/statlog/.  From the 768 samples, an equal number of 170 samples were selected randomly to represent each of the two possible results of diabetes test: positive and negative.  The remaining 428 were used as validating samples.

The data set consists of eight input variables:

a: Number of times pregnant;
b: Plasma glucose concentration at 2 hours in an oral glucose tolerance test;
c: Diastolic blood pressure;
d: Triceps skin fold thickness;
e: 2-Hour serum insulin;
f: Body mass index;
g: Diabetes pedigree function; and
h: Age.

Each feature selection process started with eight input nodes, each corresponding to one variable, and two output nodes, corresponding to the two possible test results. Each training session continued for 500 cycles.  The process was run for a total of five times.  The order in which the inputs were deleted, the average MCSR* and average CASR** at each iteration during the five processes are shown in the table below:


Iteration

Process

1

2

3

4

5

 

Inputs

used

1

abcdefgh

abcdefgh

abcdefgh

abcdefgh

abcdefgh

2

abcdefg

abcdefg

abcdefg

abcdefg

abcdefg

3

abcd fg

abcd fg

abcd fg

abcd fg

abcd fg

4

ab d fg

ab d fg

ab d fg

ab d fg

ab d fg

5

ab fg

ab fg

ab fg

ab fg

ab fg

6

b fg

ab f

ab f

ab f

ab f

7

b f

b f

b f

b f

b f

8

b

b

b

b

b

 

Average

MCSR

1

67.143

66.939

67.143

66.735

67.143

2

66.122

65.714

65.510

65.918

65.918

3

69.388

70.000

69.592

69.388

69.388

4

67.143

67.551

68.163

67.347

67.551

5

73.265

73.469

73.469

73.469

73.469

6

67.347

68.163

68.163

68.163

68.571

7

60.204

60.204

60.204

60.204

60.612

8

60.816

60.816

60.816

61.429

61.429

 

Average

CASR

1

75.047

75.234

74.907

74.860

75.561

2

74.159

74.252

74.720

74.206

74.299

3

74.486

74.579

74.486

74.486

74.579

4

74.019

74.159

74.439

74.206

74.346

5

73.738

73.645

73.692

73.598

73.598

6

75.280

74.065

73.972

74.159

73.879

7

74.299

74.299

74.299

74.299

74.252

8

72.570

72.570

72.570

72.477

72.477

 

As inputs were deleted one after another, the MCSR's and CASR's reversed the trend of decline when four and three inputs, respectively, were kept. In particular, the highest MCSR's were reached when four inputs were used.  In other words, to achieve the highest success rate for both "positive" and "negative" cases, only four inputs: a, b, f and g, should be used as inputs.  The success rate is above 50% even when only one input, b, is used.

[1]  J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler and R. S. Johannes, "Using the ADAP learning algorithm to forecast the onset of diabetes mellitus," Proceedings of 12th Symposium on Computer Applications in Medical Care (R. A. Greenes, Ed.), IEEE Computer Society Press, pp. 261-265, 1988

__________

*MCSR: The Minimum Class Success Rate was the lowest success rate among all the target classes. The average MCSR is the MCSR's averaged over the five training sessions within each process.

**CASR: The Class Average Success Rate is the success rate averaged over all the target classes.  The average CASR is the CASR's averaged over the five training sessions within each process.

Projects Main Page || Neural Network Main Page
Character Recognition || SPIE Challenge || Diabetes Forecast || Gene Recognition