Simulation study for assessment of FITCON in the presence of bienniality in MLT data of mango (Mangifera indica)

Simulation study for assessment of FITCON in the presence of bienniality in MLT data of mango (Mangifera indica)

Ram Kumar Choudhary1 , Atmakuri Ramakrishna Rao2 , Shiv Kumar Choudhary3 , Shanti Bhushan3 , Ashutosh Prasad Mauryar4 , Chandra Shekhar Choudhary5

1Dr. Rajendra Prasad Central Agricultural University, Pusa, Samastipur, Bihar-848 125, India

2ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi – 110 012, India

3Bihar Agricultural University, Sabour, Bhagalpur, Bihar-843 210, India

4National Information Centre (NIC), CGO Complex, Lodi Road, New Delhi -110008, India

5Dr. Rajendra Prasad Central Agricultural University, Pusa, Samastipur, Bihar-848 125, India

Corresponding Author Email: shiv_iasri@rediffmail.com

DOI : http://dx.doi. org/10.53709/CHE.2019.v0s1is2.0010

Abstract

Mango is an important perennial fruit crop of India which exhibits bienniality in fruiting. In the analysis of Multi-Location Trials (MLTs) data, it is noticed that a few entries in the genotype-environmental table are missing. Such missing values leads to incomplete data in genotype × environment analysis. Due to incompleteness, MLT data becomes unbalanced data which is a bit challenging task for the analysis of Genotype ´ Environment Interaction (GEI). This challenge gates intensified when crop exhibits bienniality like mango fruit crop.  Fitting constant (FITCON) is being used for imputing missing observations in incomplete MLT data.  In present study performance of FITCON method has been assessed considering different rates of missing observations in the presence of bienniality on the basis of ranking of genotypes based on different stability parameters as well as simultaneous selection indices in mango crop. MLTs data on mango fruit yield has been collected from All India co-ordinated research of Tropical and sub-tropical fruits, (CISH) Lucknow for 16 genotypes for fourteen years i.e. from 1990 to 2005.  Box plot of distribution correlation coefficient yield and different measures of stability as well as indices for simultaneous selection for yield and stability (SISGYS) has been used to assess the performance of FITCON on simulated data under four different empirical situations. It has been found that FITCON terminated after few iteration when imputation was done for more that 10% of missing observations after eliminating the bienniality. Thus, FITCON is recommended for imputation up to 10% of missing observations of MLT data in mango.

Keywords

Box plot and Bienniality, FITCON, GEI, Mango, MLT, SISGYS

Download this article as:

Introduction

Mango (Mangifera indica L.) is one of the important perennial fruit crops grown in many tropical and subtropical countries [1-4]. Mango fruit is also called as ‘King of fruits’ in India due to its sweetness, richness of taste, huge variability, large production volume and variety of usages like pickles, juice, chutney etc. India is the largest producer of mango in the world, with an annual production of 21.82 million tons from an area of 2.25 million hectares (National Horticulture Board, 2017-18)contributing about 56% of the total world production. Large number of varieties of mango exists in India. Mango exhibits biennial rhythm in fruiting [5-7]. Due to this behaviour growers are facing economic loss during ‘off’ year with poor yield or no yields and the selling-off the heavy yield at low price during ‘on’ year due to oversupply in the market.

            When combined analysis of Multi-Location Trials (MLTs) data is done, it is noticed that quite a few entries in the genotype-environmental table are missing.  Such missing values leads to incomplete data in genotype × environment analysis. Incomplete data are primarily due to the result of few genotypes having not been tested in all the environments due to constraints like, insufficient seed, failure of planting, non-germination and pest & disease attack.  Also, in some MLTs the entries under test may be discontinued and new entries may be added while testing the performance of genotypes. Due to incompleteness, MLTs data becomes unbalanced data which is a bit challenging task for the analysis of GEI. The problem of analysis of unbalanced MLTs data gates intensified when crop exhibits bienniality. Fitting constant (FITCON) analysis and ‘modified regression’ procedure given by [8]9] respectively are the two commonly used procedures for imputing missing observations in incomplete MLTs data. [10-14] applied the above method of modified regression analysis in the form of “augmented FITCON” for estimating yield sensitivity of wheat varieties and recommended list trial involving incomplete variety-centre data. The performance of FITCON needs to be assessed for different rates of missing observations in the presence of bienniality. Therefore, in this paper performance of FITCON has been assessed for incomplete MLTs data in presence of bienniality. Assessment has been carried out on the basis distribution of correlation coefficient of ranking of the genotypes obtained by different stability parameters as well as SISGYS indices between complete simulated data and imputed simulated data in mango fruit crop.

 Material and Methods

            If the yield of some of the genotypes is not available then the orthogonality of the original design disappears and bias is introduced in the varietal means. The comparisons based on the biased means favour the varieties which happen to be exposed to better environmental conditions than average environmental conditions. To avoid this compensation has to be made in the varietal means for environments in which particular varieties were not present. This is done by the least squares procedure which is obtained below:

            Yij = μi + ej + εij                                                                                                                             (1)

and by regressing the existing Yij‘s on the estimated ejs , the estimates of bi (i = 1, 2, 3, …, t),  the linear sensitivities of the individual genotypes were obtained. In model (1), μi is the mean of the ith variety, ej the effect of the jth environment (j= 1, 2, …, s) and εij the random error, distributed with mean zero and a constant variance. For estimating the parameters μi and ej, the residual sum of squares was minimized:

with respect to μi and ej, noting that the weight δij is introduced to obtain the incomplete data set-up are such that

δij         = 1 if Yij is present in the data

            = 0 if Yij is missing.

Differentiating residual sum of squares with respect to μi and ej, the normal equations become

                                                                                      (2)

                                                                                       (3)

These normal equations were solved subjected to the constraint

                                                                                                                           (4)

From equation (3.42), (3.43) and (3.44) the estimates of μi‘s and ej‘s can be obtained as

                                                                                                     (5)

,                                                                                                            (6)

 is present and 0 if absent                                                              

where  and  are the means based on the existing ni and njobservations for the ith variety and jth environment respectively and the adjustment for these estimators depend on each other’s final estimates. Here j*’s denote the environments where the ith variety is absent. The iteration procedure starts by considering the trial value  for  in Eq. (6) giving rise to a set of ej values. Substituting these values in (5) revised estimates of ej are obtained. These are then substituted in (6) to get the revised estimates of ej. This procedure is continued till there is a convergence in the value of ej. The iterative process usually converges in a small number of iterations. Using the final set of ej values (), the estimate of μi can be obtained from the equation

                                                                        (7)

Initially, a complete data set with bienniality (WB) was taken (kept as reference data) and then genotypes were ranked based on yield performance, stability measures and  SIS indices named as situation-1. Randomly missing observations were created and later on imputed by FITCON methods. This was further subjected to rank the genotypes for selection purposes considered as situation-2. In situation-3 bienniality was removed (WOB) by using method given by [15] and then genotypes were ranked . Also, the data under situation-2 was taken subjected to removal of bienniality. This leads to the creation of missing data WOB. FITCON were applied on the missing data WOB to impute the missing observation in situation 4). Subsequently, the genotypes were ranked based on  1.Yield, 2. ASTABi  value [16] 3.  ASTABi rank, 5. Shukla’s stability value [17] 6. Shukla’s stability rank, 7. Index 1 value[18], 8. Index 1 rank, 9. Index 2 value[19], and 10. Index 2 rank. The rank correlations between the genotypes ranked under situations-1 and situation -2, and  Situation-3 and situation-4 have been worked out on simulated data. 1000 data sets have been generated by developing code in My-SQL. R-code was written for imputing simulated 1000 data sets using FITCON. Then SAS code were written for plotting  Box-plot of correlation co-efficient of ranking of genotypes obtained by yield, and diffrent stability measures as well as SISGYS  between different situations.

It is worth mentioning that the performances of measures as well as effect of bienniality were studied under different levels (percentage) of missing observations in the data. Hence, FITCON has been assessed for performance under four different rates of missing observation (5%, 10%, 15%, and 20%) of missing observations simulated through random deletions from the complete data set and that too in presence of bienniality.

In the process of eliminating bienniality from the data by taking moving average of two consecutive years/environments, the values corresponding to the moving average involving missing observations are treated as missing. Due to this process number of missing observations becomes approximately doubled as compared to the number of missing observation in the incomplete MLTs data barring exceptions for the missing of first and last observation or consecutive observation in the given row of a 2-way table.

MY-SQL code for simulation

Stored Procedure for creation of random values

CREATE  PROCEDURE [dbo].[rkc_ijkl_random]                   

  AS          

declare @loop_i INT       

declare @loop_jint

declare @loop_kint

declare @loop_lint

declare @val_ijk numeric(8,2)       

declare @val numeric(8,2)       

declare @rdm1 numeric(8,2)     

declare @count1 int

declare @loop int

declare @final1 numeric(8,2)     

SET @loop_i = 1       

SET @loop_j = 1       

SET @loop_k = 1      

set @count1 = 1      

set @loop = 1     

WHILE @loop <= 1000     

BEGIN     

set @count1 = 1      

while @count1 <= 2048     

begin

set @rdm1= (CAST(SQRT(-2*LOG(RAND()))*COS(2*PI()*RAND(CHECKSUM(NEWID())))as decimal(5,2)))  

……………..

…………………..

…………………

end

Stored Procedure  for generation of values with help of Random values

CREATE  PROCEDURE [dbo].[rkc_ijkl_random_final_val]                    

AS      

declare @loop_i INT   

declare @loop_jint

declare @loop_kint

declare @loop_lint

declare @val_ijk numeric(8,2)   

declare @val numeric(8,2)   

declare @rdm1 numeric(8,2) 

declare @count1 int

declare @set1 int

declare @final1 numeric(8,2) 

declare @temp1 numeric(8,2) 

declare @temp2 numeric(8,2) 

set @set1=1 

while @set1 <= 1000 

………..

………..

………..

set @temp2 = (select (mu+gi+ej+yk_j+geij+gyikj+rikj+eijkl) from rkc_data_final222 where set1=@set1 and count1=@count1) 

set @val = @temp1 + @temp2 

insert into final_data2304_1000(ii,jj,kk,ll,final,count1,set1) values(@loop_i,@loop_j,@loop_k,@loop_l,@val,@count1,@set1) 

………….

………..

……….set @loop_i=@loop_i+1 

end

set @set1=@set1+1 

end

/*R-code for imputation by Fitcon*/

setwd(“D:/ramfitcon”)

                data<-as.matrix(read.table(“miss_sim128_100.txt”, header=T))

                s<-seq(from=128,to=64000,by=128)

                s1<-c(1,s[-500]+1)

                s2<-s

                for(j in 1:500)

                {

                set1<-data[s1[j]:s2[j],1]

                y<-1:128;

                z<-sample(y,13)

                set1[z]<-NA

                x=matrix(set1,nrow=16,ncol=8,byrow=TRUE);

                nr <- nrow(x)

                nc <- ncol(x)

                rm <- rowMeans(x, na.rm=T)

                cm <- colMeans(x, na.rm=T)

                Ex <- cm

                Mx <- rm

#est <- matrix(0,nrow=nr, ncol=4)

                E2 <- 0

                M2 <- 0

                E0 <- Ex                               

                for(i in 1:100){

                NAR <- function(s){z <- sum(E0[which(is.na(s)==T)])/sum(!is.na(s));z}

                M1 <- Mx+apply(x,1,NAR)

                NAC <- function(s){z <- sum(M1[which(!is.na(s)==T)])/sum(!is.na(s));z}

                E1 <- Ex-apply(x,2,NAC)

                de <- sum(abs(E1-E2))

                dm <- sum(abs(M1-M2))

                if(de <= nc*0.001 && dm <= nr*0.001)

                {

                print(i)

                est1 <- cbind(E2,E1,M2,M1)

                break()

                }

                else{

                E0 <- E1

                E2 <- E1

                M2 <- M1

                }

                            }

                E <- E2

                M <- M2

                for( i in 1:nc){ z <- which(is.na(x[,i])==T); x[z,i] <- E[i]+M[z]}

                x.imp<-x

                row.names(x.imp)= c(1:16)

                colnames(x.imp) = c(1:8)

                 imp<-data.frame(i=rep(colnames(t(x.imp)),each=nrow(t(x.imp))),         

                j=rep(row.names(t(x.imp)), ncol(t (x.imp))),yld=as.vector(t(x.imp)))

                yij<-list(imp)

                se<-list(set1)

                J<-list(j)

                out<-c(J, se, yij)

                write.table(out, file=”impfitcon15.csv”, append=TRUE)

                }

/ *SAS code for ranking and boxplot for simulated correlations */

data ram1;

proc import datafile= “<path>”

out=s32 dbms=xlsx;  

run;

prociml;

use s32;

read all into s;

aa=1;

aa1=1;

m=1000; /* number of iterations required*/

…………….

……………..

……………..

y=shape(yld,loc,var);

y=y`;

y=shape(yld,var,loc);

………….

…………….

…………..

a1=a[+,];

w=diag(x*x`);

msge=a1/((var-1)*(loc-1));

do i=1 to var;

sig2=((w[i,i]*var)/((loc-1)*(var-2)))-(msge/(var-2));

sig4=sig4//sig2;

sig5=1/sig2;

sig6=sig6//sig5;

end;

………………

……………

…………..

do i=1 to var;

e=1/sqrt((ssq(d[i,])));

f=f//e;

end;

e3=(1/f);

f4=rank(e3);

e1=f[+,]/var;

do

…………….

……………

……………

yldrank=yldrank||rank1;

indexval=indexval||index;

rankindex=rankindex||rank;

stabval=stabval||stability;

stabrank=stabrank||rank2;

bajval=bajval||baj1;

bajrank=bajrank||rank3;

sig4val=sig4val||sig4;

sig4rank=sig4rank||rank4;

FREE geno;

free f3;

free index;

free stability;

free f1;

free f;

free baj1;

free sig4;

free sig6;

end;

/*—————————————————-*/

yldvalcorr=corr(yldval);

yldcorr=corr(yldrank);

corrindval=corr(indexval);

corrindex=corr(rankindex);

……………..

………..

…………

plot xx2*xx1 /

boxstyle = schematic

nohlabel;

insetgroup min max nhighnlownout/

header = ‘Extremes rank correlation’;

/*   legend1 label = (‘Cancellations’);*/

label xx2=’xx2  rank correlation’;

label xx1 = ‘xx1 indeices and ranks’;

run;

============================================================

Results and Discussion

Fitting constant (FITCON) has been used to impute four rates i.e. 5%, 10%, 15%, and 20% of missing observations on simulated data by running R-code. Influence of missing observation on yield and different  stability measures as well SISGYS indices (1.Yield , 2. ASTABi  value, 3.  ASTABi, 5. Shukla’s stability value, 6. Shukla’s stability rank,  7. Index-1 value, 8. Index-1 value, 9. Index-2 value and 10 Index-2 rank ) were  assessed based on correlation between situation-1 and  situation-2 by FITCON method. Box plots (Gabriel, 1971).   have been given for all the rates of missing observations for yield, stability measures and SISGYS indices and depicted in Fig. 1 and Fig. 2 for Sangareddy centre. Fig. 1 depicts the distribution of correlation between original simulated data with bienniality (situation-1)  and imputed by FITCON  simulated data with bienniality (situation-2) for Sangaraddy centre from 1 to 10 on X-axis obtained by  1.Yield , 2. ASTABi  value, 3.  ASTABi, 5. Shukla’s stability value,  6. Shukla’s stability rank,  7. Index-1 value, 8. Index-1 value, 9. Index-2 value and 10 Index-2 rank )  for four rates of missing observations. It has been found that mean correlation of 1000 sets of data are slightly decreasing from order of 0.9 as rate of missing observations increases from 5 to 20 %.  The correlation is slightly lower for SISGYS indices as compared to Yields and stability measures. Therefore, imputation by FITCON can safely be used up to 20 % missing observations with bienniality data. The distribution of 1000 sets without biennilaity (WOB)and imputed data by FITCON without bienniality (WOB) obtained for   (1.Yield , 2. ASTABi  value, 3.  ASTABi, 5. Shukla’s stability value,  6. Shukla’s stability rank,  7. Index-1 value, 8. Index-1 value, 9. Index-2 value and 10 Index-2 rank)  have been depicted in Fig.2 . It has been observed that order of correlation are slightly lower that the case of with bienniality for all most all the measures 1 to 10  from 5% T 10% missing observations. It has also been found that iteration got terminated when imputation done by FITCON for more than 10% missing observations. This is  due to eliminating bienniality by taking moving average of two consecutive observation this courses doubling of missing observations. As like with bienniality in case of without bienniality also showing lower correlations for SISGYS indices abs compared to yield and stability measures. Disperson of correlation is more in case of SISGYS indices as compared to yield and  measures.

Finally it can be concluded that FITCON safely be used for imputation up 20 missing observations in case of with biennilaity data and up while up only 10% missing observations in case of without bienniality data.

Fig. 1 Box plot showing of the distribution of correlations between situation 1 and 2 for different measures between imputed data by FITCON WB  and original data WB at Sangareddy; (1 to 10 on x-axis); 1:Yield value, 2 : Yield rank,  3 : ASTABi  value  4 : ASTABi  rank  5 : Shukla’s stability value  6 : Shukla’s stability rank  7 : Index 1 value  8 : Index 1  rank  9 : Index 2 value  10 : Index 2 rank

Fig. 2 Box plot showing of the distribution of correlations between situation 3 and 4 for different measures between imputed data by FITCON WOB  and original data WOB at Sangareddy; (1 to 10 on x-axis); 1:Yield value, 2 : Yield rank,  3 : ASTABi  value  4 : ASTABi  rank  5 : Shukla’s stability value  6 : Shukla’s stability rank  7 : Index 1 value  8 : Index 1  rank  9 : Index 2 value  10 : Index 2 rank

AKNOWLEDGEMENT

We are highly obliged to the ICAR-Indian Agricultural Statistical Research Institute, New Delhi for providing facilities for carried out my research work. We are also thankful to the RPCAU, Pusa, Bihar for providing financial support and All India Coordinated Research Project on Sub-Tropical Fruits (AICRP-STF), CISH, Lucknowproviding data.  

REFERNCES

  • Bajpai, P. K. and Prabhakaran, V. T. (2000).  A new procedure of simultaneous selection for high yielding and stable crop genotypes. Indian Journal of Genetics,60(2),141-146.
  • Choudhary, R. K., Rao, A.R, Wahi, S.D. and Misra, A.K. (2016). Detection of biennial rhythm and estimation of repeatability in mango (Mangifera indica L.), Indian J. Genet., 76(1), 88-97.
  • Digby,  P. G. N. (1979). Modified joint regression analysis for incomplete variety × environment data.  Journal of  Agricultural Sciences,  93, 81-86.
  • Gabriel, K.R. (1971).  The biplot-graphical display of matrices with applications to principal component analysis.  Biometrika, 58, 453-467.
  • Mukherjee S.K., and Litz R.E. (2009). Introduction: Botany and Importance. In: Litz RE (ed) The mango botany, production and uses, 2nd edn. CBI International, Wallingford, 1-18.
  • Patterson, H. D. and Silvey, V. (1980). Statutory and recommend list trials of crop varieties in U.K. Jour. Roy. Stat. Soc., A.,143, 291-252.
  • Patterson, H.D. (1978). Routine least squares estimation fo variety means in incomplete tables. Jour. Nat. Inst. Ag. Bot., 14, 401-4013.
  • Rao, A. R., Choudhary, S. K., Wahi, S. D.  and. Prabhakaran, V. T (2010). An index for simultaneous selection of genotypes for high yield and stability under incomplete genotype × environment data. Indian J. Genet., 70(1), 80-84.
  • Rao, A.R., and Prabhakaran, V.T., (2005). Use of AMMI in simultaneous selection of genotypes for yield and stability. Jour. Ind. Soc. Ag. Statistics, 59(1), 76-82
  • Rao, A.R., Prabhakaran, V.T. and Singh, A.K. (2004). Development of statistical procedures for selecting genotypes simultaneously for yield and stability. IASRI Publication, ICAR-IASRI, New Delhi.
  • Shulka, G. K. (1972). Some statistical aspects of partitioning genotype-environmental components of variability. Heredity, 29: 237-45.
  • Wahi, S.D. and Malhotra, P.K. (1993). Estimation of repeatability of fruit yield in presence of biennial rhythm. IASRI Publication, New Delhi.
  • Kempthorne, O. (1978). A biometrics invited paper: Logical, epistemological and statistical aspects of nature-nurture data interpretation. Biometrics, 1-23.
  • Simon, R., & Altman, D. G. (1994). Statistical aspects of prognostic factor studies in oncology. British journal of cancer69(6), 979.
  • Veech, J. A., & Crist, T. O. (2010). Toward a unified view of diversity partitioning. Ecology91(7), 1988-1992.
  • Økland, R. H., & Eilertsen, O. (1994). Canonical correspondence analysis with variation partitioning: some comments and an application. Journal of Vegetation Science5(1), 117-126.
  • Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological methods14(4), 323.
  • Ferreira, F. C. (2008). Comments about some species abundance patterns: classic, neutral, and niche partitioning models. Brazilian Journal of Biology68, 1003-1012.