Generalized Method of Moments and Generalized Estimating Functions Based on Probability Generating Function for Count Models ()

Andrew Luong^{}

école d’actuariat, Université Laval, Ste Foy, Québec, Canada.

**DOI: **10.4236/ojs.2020.103031
PDF HTML XML
220
Downloads
517
Views
Citations

école d’actuariat, Université Laval, Ste Foy, Québec, Canada.

Generalized method of moments based on probability generating function is considered. Estimation and model testing are unified using this approach which also leads to distribution free chi-square tests. The estimation methods developed are also related to estimation methods based on generalized estimating equations but with the advantage of having statistics for model testing. The methods proposed overcome numerical problems often encountered when the probability mass functions have no closed forms which prevent the use of maximum likelihood (ML) procedures and in general, ML procedures do not lead to distribution free model testing statistics.

Keywords

Mixture Distributions, Consistent Chi-Square Tests, Infinitely Divisible Distributions, Mixture Distributions, Distribution Free Test Statistics, Model Testing

Share and Cite:

Luong, A. (2020) Generalized Method of Moments and Generalized Estimating Functions Based on Probability Generating Function for Count Models. *Open Journal of Statistics*, **10**, 516-539. doi: 10.4236/ojs.2020.103031.

1. Introduction

Count data are often encountered in many fields of applications which include actuarial sciences and fitting discrete count models are of interests. Classical methods such as maximum likelihood (ML) procedures often require the probability function of the model to have closed-form and furthermore the inferences techniques do not lead to distribution free statistics when using the Pearson statistics. In fact, if a model does not fit the data, better models can be created using compound procedure, stop sum procedure or mixing procedure and the new models might provide a better fit as they can take into account modeling processes which were omitted earlier.

For discussions on these procedures see the books by Johnson et al. [1], Klugman et al. [2] but for these better models often they do not have closed-form probability mass functions but their probability generating functions often remain simple and have closed-form expressions.

For example, if count data display long tailed behavior so that the Poisson model with probability generating function ${P}_{\theta}\left(s\right)={e}^{\theta \left(s-1\right)},\theta >0$ does not provide a good fit, the positive discrete stable (DPS) distribution can be created and be used as an alternative to the Poisson distribution. The discrete positive stable distribution (DPS) does not have closed-form or simple form for probability mass function but its probability generating function is simple and given by

${P}_{\delta}\left(s\right)={\text{e}}^{\theta {\left(s-1\right)}^{\alpha}},\delta ={\left(\theta ,\alpha \right)}^{\prime},\alpha \in \left(0,1\right],\theta >0$

see Christoph and Schreiber [3] for this distribution. In their paper, expression (6) gives the representation of the probability mass function of the DPS distribution using series,

$p\left(x=k;\delta \right)={\left(-1\right)}^{k}{\displaystyle {\sum}_{j=0}^{\infty}\left(\begin{array}{c}j\alpha \\ k\end{array}\right)\frac{{\left(-\theta \right)}^{j}}{j!}},k=0,1,\cdots $

and expression (8) gives the recursive formula to compute $p\left(x=k;\delta \right)$ using the previous terms

$p\left(x=0;\delta \right),\cdots ,p\left(x=k-1;\delta \right)$

with

$\left(k+1\right)p\left(x=k+1;\delta \right)=\theta {\displaystyle {\sum}_{m=0}^{k}p\left(x=k-m;\delta \right)\left(m+1\right){\left(-1\right)}^{m}\left(\begin{array}{c}\alpha \\ m+1\end{array}\right)},k=0,1,\cdots $

The probability mass function appears to be complicated and for model validation there is a need for a statistic for model testing. By having these issues, it will make maximum likelihood (ML) procedures difficult to implement.

GMM procedures based on probability generating function appear to be a natural way to introduce alternatives to ML procedures, bypassing the use of the probability mass function explicitly and focus uniquely on the probability generating function. In this vein, the procedures proposed in this paper make use of GMM and generalized estimating equation theory and they are less simulation intensive oriented than inference techniques given by the paper by Luong et al. [4].

We shall use general GMM methodology but adapted it to situations where moment conditions are based on probability generating function so that estimation and model testing can be carried out in a unified way for discrete count models. The choice of moments of the developed GMM procedures makes use of estimating function theory which allows us the use a number of points based on probability generating which tends to infinity as the sample size $n\to \infty $. Furthermore, we also related GMM estimation with the approach using generalized estimating equations (GEE) based on a set of elementary or basic unbiased estimating function but unlike GEE procedures, GMM procedures also provide distribution free chi-square statistics for model testing but the theory of estimating function is useful as it provides insight on the choice of sample moments for GMM estimation. In another word, the proposed methods blend classical GMM procedures and inference techniques based on estimating equation which in general will allow flexibility, efficiency and model testing yet being relatively simple to implement and might be of interests for practitioners. Consequently, the new methods differ from proposed GMM procedures in the literature from the following points:

1) GMM procedures as proposed by Doray et al. [5] only make use of a finite number of points of the probability generating function, our methods aim at achieving higher efficiency yet remain simple to implement and it is done by linking to the theory of estimating function, it can accommodate the use of a number of points from the probability generating function instead of being fixed, it goes to infinity as $n\to \infty $.

2) The new GMM procedures remain simpler to implement than GMM procedures using a continuum moment conditions in general as proposed by Carrasco and Florens [6] or methods on adapting GMM procedures using a continuum of moment conditions for characteristic function proposed by Carrasco and Kotchoni [7] to probability generating function. Practitioners might find the sophisticated methods based on a continuum moment conditions difficult to implement.

The paper is organized as follows. In Section 2, we review available results from general GMM theory, despite the results are not new once the moment conditions are defined but they make the paper more self-contained as these results will be adapted subsequently with moments conditions extracted from the probability generating function when count models are considered. In Section 3, GMM estimation and related GEE estimation for count models are considered. The chi-square statistics are also given in Section 3.2.2. In Section 3.2.3, we consider GMM procedures based on optimum orthogonal estimating functions. In Section 4 we illustrate the implementation of the GMM methodology and preliminary results show that the methods are simple to implement and have the potentials of being very efficient. The new methods display flexibility as they can accommodate the changes to the sample moments for better efficiencies if needed and it can be done within the framework of the inference methods developed.

2. Generalized Method of Moments (GMM) Methodology

The inferences techniques based on probability generating functions developed in this paper make use of results of Generalized Method of Moments (GMM) theory which are well established once the moment conditions are specified, see Martin et al. [8] (p 352-384, also see Hamilton [9]. In this section, we shall briefly review GMM methodology for estimation and moment restrictions testing to make the paper easier to follow and connect to the problem on how to select moment conditions for discrete distributions based on probability generating functions for applying GMM methods.

The estimating equations of GMM methods will also be linked to the theory of estimating equations and generalized estimating equations (GEE) as developed by Godambe and Thompson [10], Morton [11], Liang and Zeger [12].

2.1. Generalized Estimating Equations (GEE) and GMM Estimation

For data, we shall assume that we have n independent observations ${y}_{1},\cdots ,{y}_{n}$, these observations need not be identically distributed but each ${y}_{i}$ will follow a distribution which depends on the same vector of parameters $\theta ={\left({\theta}_{1},\cdots ,{\theta}_{p}\right)}^{\prime}$, $\theta \in \Omega $, $\Omega $ is compact and $\in {R}^{p}$. The true vector of parameters is denoted by ${\theta}_{0}$.

For the time being, assume that we have identified n unbiased basic estimating functions or elementary estimating functions denoted by ${h}_{i}={h}_{i}\left({y}_{i};\theta \right),i=1,\cdots ,n$ with the property

${E}_{\theta}\left({h}_{i}\left({y}_{i};\theta \right)\right)=0$ for $i=1,\cdots ,n$. (1)

The optimum estimating functions based on linear combinations of $\left\{{h}_{i}\left({y}_{i};\theta \right),i=1,\cdots ,n\right\}$ for estimating ${\theta}_{0}$ is given by

${g}^{\left(r\right)}\left(\theta \right)={\displaystyle {\sum}_{i=1}^{n}{h}_{i}\left({y}_{i};\theta \right)\frac{{E}_{\theta}\left(\frac{\partial {h}_{i}}{\partial {\theta}_{r}}\right)}{{E}_{\theta}\left({\left({h}_{i}\right)}^{2}\right)}},r=1,\cdots ,p$ (2)

and ${E}_{\theta}\left({\left({h}_{i}\right)}^{2}\right)$ is the variance of ${h}_{i}\left({y}_{i};\theta \right)$.

The vector of estimators ${\stackrel{^}{\theta}}_{op}$ based on the optimum estimating equations are given as solutions of the system of equations ${g}^{\left(r\right)}\left(\theta \right)=0,r=1,\cdots ,p$. This result is given by Godambe and Thompson [10] (page 4) and Morton [11] (page 229-230).

In applications, often we restrict our attention to ${h}_{i}\left({y}_{i};\theta \right)$ with some common functional form so that we also use the notations ${h}_{i}\left({y}_{i};\theta \right)=h\left({y}_{i};\theta \right),i=1,\cdots ,n$ and more precisely $h\left({y}_{i};\theta \right)=h\left({y}_{i};{s}_{i},\theta \right)$, ${s}_{i}$ is a constant.

With this notation which is commonly used in the literature, notice that the random variables given by $h\left({y}_{i};\theta \right),i=1,\cdots ,n$ need not be identically distributed.

Also, since estimating equations are defined up to a constant which does not depend on $\theta $, the related estimating functions used can be re-expressed equivalently as

${g}^{\left(r\right)}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({y}_{i};\theta \right)}\frac{{E}_{\theta}\left(\frac{\partial h\left({y}_{i};\theta \right)}{\partial {\theta}_{r}}\right)}{{E}_{\theta}\left({\left(h\left({y}_{i};\theta \right)\right)}^{2}\right)},r=1,\cdots ,p$ (3)

and the vector of estimators based on the optimum equations are given as solutions of the system of equations ${g}^{\left(r\right)}\left(\theta \right)=0,r=1,\cdots ,p$, using expression (3). Using vector notations, the vector of optimum estimating functions based on

expression (3) can be expressed as $g\left(\theta \right)=\left(\begin{array}{c}{g}_{1}\left(\theta \right)\\ \vdots \\ {g}_{r}\left(\theta \right)\end{array}\right)$, $g\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({y}_{i};\theta \right)}\frac{{E}_{\theta}\left(\frac{\partial h\left({y}_{i};\theta \right)}{\partial \theta}\right)}{{E}_{\theta}\left({\left(h\left({y}_{i};\theta \right)\right)}^{2}\right)}$ and the vector of estimators ${\stackrel{^}{\theta}}_{op}$ based on $g\left(\theta \right)$ are solutions of $g\left(\theta \right)=0$ and from this observation it is clear that the factor $\frac{1}{n}$ can be omitted when defining estimating functions or equations.

Now suppose that we have vector $m\left({y}_{i};\theta \right)=\left(\begin{array}{c}{m}_{1}\left({y}_{i};\theta \right)\\ \vdots \\ {m}_{k}\left({y}_{i};\theta \right)\end{array}\right)$ with the property

${E}_{\theta}\left(m\left({y}_{i};\theta \right)\right)=\left(\begin{array}{c}{E}_{\theta}\left({m}_{1}\left({y}_{i};\theta \right)\right)\\ \vdots \\ {E}_{\theta}\left({m}_{k}\left({y}_{i};\theta \right)\right)\end{array}\right)=0$ for $i=1,2,\cdots $,

the optimum estimating functions for estimating $\theta $ based on linear combinations of the elements of the set $\left\{m\left({y}_{i};\theta \right),i=1,\cdots ,n\right\}$ are also called generalized optimum estimating functions, see Morton [11] (p 229-230), also see expression (6) as given by Liang and Zeger [12] (page 15) are given by

${\sum}_{i=1}^{n}{{C}^{\prime}}_{i}\left(\theta \right){V}_{i}^{-1}\left(\theta \right)m\left({y}_{i};\theta \right)$ (4)

and the estimators are given by the vector ${\stackrel{^}{\theta}}_{op}$ obtained by solving

${\sum}_{i=1}^{n}{{C}^{\prime}}_{i}\left(\theta \right){V}_{i}^{-1}\left(\theta \right)m\left({y}_{i};\theta \right)}=0$ (5)

where ${V}_{i}\left(\theta \right)$ is the covariance matrix of $m\left({y}_{i};\theta \right)$ under $\theta $ and its inverse ${V}_{i}^{-1}\left(\theta \right)$, ${V}_{i}\left(\theta \right)$ is also referred to as a working matrix in the literature of esti-

mating equation theory and ${{C}^{\prime}}_{i}\left(\theta \right)=\left(\begin{array}{ccc}{E}_{\theta}\left(\frac{\partial {m}_{1}\left({y}_{i},\theta \right)}{\partial {\theta}_{1}}\right)& \cdots & {E}_{\theta}\left(\frac{\partial {m}_{k}\left({y}_{i},\theta \right)}{\partial {\theta}_{1}}\right)\\ \vdots & \ddots & \vdots \\ {E}_{\theta}\left(\frac{\partial {m}_{1}\left({y}_{i},\theta \right)}{\partial {\theta}_{p}}\right)& \cdots & {E}_{\theta}\left(\frac{\partial {m}_{k}\left({y}_{i},\theta \right)}{\partial {\theta}_{p}}\right)\end{array}\right)$ which is a p by k matrix.

Clearly expression (4) is more general than expression (3) and is reduced to expression (3) when $m\left({y}_{i};\theta \right)$ is a scalar instead of a vector.

For the studies of estimating functions, Godambe and Thompson [10] emphasized efficiency of estimating equations rather efficiency of the vector of optimum estimators ${\stackrel{^}{\theta}}_{op}$ obtained by solving estimating equations.

For applications, often we need the asymptotic covariance matrix of ${\stackrel{^}{\theta}}_{op}$. For this purpose, we use the set up for the study of generalized estimating equations (GEE) as considered by Liang and Zeger [12] (p 15-16). Using a Taylor’s expansion and results of their Theorem 2 (p 16) we can obtain the asymptotic covariance matrix of ${\stackrel{^}{\theta}}_{op}$.

$\sqrt{n}\left({\stackrel{^}{\theta}}_{op}-{\theta}_{0}\right)\stackrel{L}{\to}N\left(0,\Sigma \right)$

with

$\Sigma ={\mathrm{lim}}_{n\to \infty}n{\left({\displaystyle {\sum}_{i=1}^{n}{{C}^{\prime}}_{i}\left({\theta}_{0}\right){V}_{i}^{-1}\left({\theta}_{0}\right){C}_{i}\left({\theta}_{0}\right)}\right)}^{-1}$,

with convergence in probability denoted by $\stackrel{p}{\to}$ and convergence in distribution denoted by $\stackrel{L}{\to}$.

Therefore, the asymptotic covariance for ${\stackrel{^}{\theta}}_{op}$ is simply ${\mathrm{lim}}_{n\to \infty}{\left({\displaystyle {\sum}_{i=1}^{n}{{C}^{\prime}}_{i}\left({\theta}_{0}\right){V}_{i}^{-1}\left({\theta}_{0}\right){C}_{i}\left({\theta}_{0}\right)}\right)}^{-1}$ which can be estimated. A Fisher scoring algorithm as given by expression (6) as described by Liang and Zeger [12] (p 16) can be used to obtain the estimators numerically ${\stackrel{^}{\theta}}_{op}$. The algorithm gives the j + 1-th iteration based on the previous j-th iteration as

${\stackrel{^}{\theta}}_{op}^{\left(j+1\right)}={\stackrel{^}{\theta}}_{op}^{\left(j\right)}-{\left({\displaystyle {\sum}_{i=1}^{n}{{C}^{\prime}}_{i}\left({\stackrel{^}{\theta}}_{op}^{\left(j\right)}\right){V}_{i}^{-1}\left({\stackrel{^}{\theta}}_{op}^{\left(j\right)}\right){C}_{i}\left({\stackrel{^}{\theta}}_{op}^{\left(j\right)}\right)}\right)}^{-1}m\left({y}_{i};{\stackrel{^}{\theta}}_{op}^{\left(j\right)}\right)$.

Other numerical techniques to obtain ${\stackrel{^}{\theta}}_{op}$ can be used. For example, we can consider solving the system of equations given by expression (4) and expression (5) as ${g}^{\left(r\right)}\left(\theta \right)=0,r=1,\cdots ,p$ and ${\stackrel{^}{\theta}}_{op}$ can be obtained by minimizing $\sum}_{r=1}^{p}{\left({g}^{\left(r\right)}\left(\theta \right)\right)}^{2$, techniques for minimization can be used to obtain ${\stackrel{^}{\theta}}_{op}$ numerically.

Now we turn our attention to GMM estimation methodology and we observe that the set of estimating equations using expression (2) can be reobtained using a GMM estimation set up. GMM estimation is based on the use of a k moment conditions specified by a vector function

$m\left({y}_{i};\theta \right)=\left(\begin{array}{c}{m}_{1}\left({y}_{i};\theta \right)\\ \vdots \\ {m}_{k}\left({y}_{i};\theta \right)\end{array}\right)$

with its expectation with the property

${E}_{\theta}\left(m\left({y}_{i};\theta \right)\right)=\left(\begin{array}{c}{E}_{\theta}\left({m}_{1}\left({y}_{i};\theta \right)\right)\\ \vdots \\ {E}_{\theta}\left({m}_{k}\left({y}_{i};\theta \right)\right)\end{array}\right)=0$ for $i=1,2,\cdots $ (6)

The sample moments being the counterparts of

$\left(\begin{array}{c}{E}_{\theta}\left({m}_{1}\left({y}_{i};\theta \right)\right)\\ \vdots \\ {E}_{\theta}\left({m}_{k}\left({y}_{i};\theta \right)\right)\end{array}\right)$

are defined as ${g}^{\left(r\right)}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{m}_{r}\left({y}_{i};\theta \right)},r=1,\cdots ,k$ and define the vector of sample moments as

$g\left(\theta \right)=\left(\begin{array}{c}{g}^{\left(1\right)}\left(\theta \right)\\ \vdots \\ {g}^{\left(k\right)}\left(\theta \right)\end{array}\right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}m\left({y}_{i};\theta \right)}$. (7)

Now we need a positive definite symmetric matrix or a positive definite matrix symmetric with probability one which is denoted by ${\stackrel{^}{S}}^{-1}$ to define a quadratic form using ${g}^{\left(r\right)}\left(\theta \right),r=1,\cdots ,k$, ${\stackrel{^}{S}}^{-1}$ will be defined subsequently and this allows the objective function

$Q\left(\theta \right)={g}^{\prime}\left(\theta \right){\stackrel{^}{S}}^{-1}g(\; \theta \; )$

to be formed for GMM estimation and the GMM estimators are given by the vector $\stackrel{^}{\theta}$ which minimizes $Q\left(\theta \right)$.

We shall define the matrix $S$ first then its estimate is $\stackrel{^}{S}$ from which we can obtain its inverse ${\stackrel{^}{S}}^{-1}$. In fact $S$ can be viewed as the limit as $n\to \infty $ of

the covariance matrix of the vector $\frac{1}{\sqrt{n}}{\displaystyle {\sum}_{i=1}^{n}m\left({y}_{i};{\theta}_{0}\right)}=\sqrt{n}g\left({\theta}_{0}\right)$ and the covariance matrix of $\frac{1}{\sqrt{n}}{\displaystyle {\sum}_{i=1}^{n}m\left({y}_{i};{\theta}_{0}\right)}$ can be seen as given by $\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{E}_{{\theta}_{0}}\left(\left[m\left({y}_{i};{\theta}_{0}\right)\right]{\left[m\left({y}_{i};{\theta}_{0}\right)\right]}^{\prime}\right)}$, then $S$ and its estimate $\stackrel{^}{S}$ can be defined respectively as

$S={\mathrm{lim}}_{n\to \infty}\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{E}_{{\theta}_{0}}\left(\left[m\left({y}_{i};{\theta}_{0}\right)\right]{\left[m\left({y}_{i};{\theta}_{0}\right)\right]}^{\prime}\right)}$

and with a preliminary consistent estimate ${\stackrel{^}{\theta}}^{\left(0\right)}$ for ${\theta}_{0}$ then we can define

$\stackrel{^}{S}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{E}_{{\stackrel{^}{\theta}}^{\left(0\right)}}\left(\left[m\left({y}_{i};{\stackrel{^}{\theta}}^{\left(0\right)}\right)\right]{\left[m\left({y}_{i};{\stackrel{^}{\theta}}^{\left(0\right)}\right)\right]}^{\prime}\right)}$ or

$\stackrel{^}{S}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left[m\left({y}_{i};{\stackrel{^}{\theta}}^{\left(0\right)}\right)\right]{\left[m\left({y}_{i};{\stackrel{^}{\theta}}^{\left(0\right)}\right)\right]}^{\prime}}$

and $\stackrel{^}{S}$ is positive definite with probability one and clearly $\stackrel{^}{S}$ is symmetric, its inverse is ${\stackrel{^}{S}}^{-1}$ which exists with probability one. Despite that these two expressions for $\stackrel{^}{S}$ are asymptotically equivalent but for numerical implementations of the methods in finite samples, the matrix

$\stackrel{^}{S}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{E}_{{\stackrel{^}{\theta}}^{\left(0\right)}}\left(\left[m\left({y}_{i};{\stackrel{^}{\theta}}^{\left(0\right)}\right)\right]{\left[m\left({y}_{i};{\stackrel{^}{\theta}}^{\left(0\right)}\right)\right]}^{\prime}\right)}$ has more chance to be invertible.

Under suitable differentiability assumptions imposed on the vector function $g\left(\theta \right)$, the GMM estimators given by $\stackrel{^}{\theta}$ is consistent and has an asymptotic multivariate normal distribution, i.e.,

$\stackrel{^}{\theta}\stackrel{p}{\to}{\theta}_{0}$

and

$\sqrt{n}\left(\stackrel{^}{\theta}-{\theta}_{0}\right)\stackrel{L}{\to}N\left(0,V\right).$

The asymptotic covariance of $\stackrel{^}{\theta}$ is simply $Acov\left(\stackrel{^}{\theta}\right)=\frac{1}{n}V$ and $V$ depends on ${\theta}_{0}$ so we also use the notation, $V=V\left({\theta}_{0}\right)$ and $V={\left({D}^{\prime}\left({\theta}_{0}\right){S}^{-1}D\left({\theta}_{0}\right)\right)}^{-1}$ with ${D}^{\prime}\left({\theta}_{0}\right)={\mathrm{lim}}_{n\to \infty}\left[\begin{array}{ccc}{E}_{{\theta}_{0}}\left(\frac{\partial {g}^{\left(1\right)}\left({\theta}_{0}\right)}{\partial {\theta}_{1}}\right)& \cdots & {E}_{{\theta}_{0}}\left(\frac{\partial {g}^{\left(k\right)}\left({\theta}_{0}\right)}{\partial {\theta}_{1}}\right)\\ \vdots & \ddots & \vdots \\ {E}_{{\theta}_{0}}\left(\frac{\partial {g}^{\left(1\right)}\left({\theta}_{0}\right)}{\partial {\theta}_{p}}\right)& \cdots & {E}_{{\theta}_{0}}\left(\frac{\partial {g}^{\left(k\right)}\left({\theta}_{0}\right)}{\partial {\theta}_{p}}\right)\end{array}\right]$ and ${D}^{\prime}\left({\theta}_{0}\right)$ is a p by k matrix, its transpose is $D\left({\theta}_{0}\right)$. Since $V=V\left({\theta}_{0}\right)$, an estimate of $V\left({\theta}_{0}\right)$ is

$\stackrel{^}{V}={\left({D}^{\prime}\left(\stackrel{^}{\theta}\right){\stackrel{^}{S}}^{-1}D\left(\stackrel{^}{\theta}\right)\right)}^{-1},D\left(\stackrel{^}{\theta}\right)=\frac{\partial g\left(\stackrel{^}{\theta}\right)}{\partial {\theta}^{\prime}}$.

Using $\stackrel{^}{V}$, the asymptotic covariance matrix of $\stackrel{^}{\theta}$ can be estimated.

We also notice that we can recover optimum estimating equations estimators using the following GMM estimation set-up by letting $k=p$, i.e., the number of sample moments is equal to the number of parameters to be estimated and

${g}^{\left(t\right)}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{m}_{r}\left({y}_{i};\theta \right)},r=1,\cdots ,p$

with

${m}_{r}\left({y}_{i};\theta \right)=h\left({y}_{i};\theta \right)\frac{{E}_{\theta}\left(\frac{\partial h\left({y}_{i};\theta \right)}{\partial {\theta}_{r}}\right)}{{E}_{\theta}\left({\left(h\left({y}_{i};\theta \right)\right)}^{2}\right)}$.

Minimizing the corresponding GMM objective function yields the vector of GMM estimators which are given by the following system of equations since ${\stackrel{^}{S}}^{-1}$ is positive definite with probability one,

${g}^{\left(r\right)}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({y}_{i};\theta \right)}\frac{{E}_{\theta}\left(\frac{\partial h\left({y}_{i};\theta \right)}{\partial {\theta}_{r}}\right)}{{E}_{\theta}\left({\left(h\left({y}_{i};\theta \right)\right)}^{2}\right)}=0,r=1,\cdots ,p$ (8)

which is the same system of equations for obtaining the optimum estimating equations estimators as discussed. Using vector notations, the vector of optimum

estimating functions is simply $g\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({y}_{i};\theta \right)}\frac{{E}_{\theta}\left(\frac{\partial h\left({y}_{i};\theta \right)}{\partial \theta}\right)}{{E}_{\theta}\left({\left(h\left({y}_{i};\theta \right)\right)}^{2}\right)}$ and the related estimators are obtained by solving $g\left(\theta \right)=0$.

The estimating equations based on of GMM procedures are based on partial derivatives of $Q\left(\theta \right)$ and can be seen as equivalent to

${\sum}_{i=1}^{n}{D}^{\prime}\left(\theta \right){\stackrel{^}{S}}^{-1}m\left({y}_{i};\theta \right)}=0$. (9)

Observe that the vector of estimating functions is also formed based on linear combinations of elements of $\left\{m\left({y}_{i};\theta \right),i=1,\cdots ,n\right\}$ which is similar to the vector of optimum estimating functions but it might not be optimum as the matrix $D\left(\theta \right)$ and the matrix $S$, ${\stackrel{^}{S}}^{-1}$ no longer depends on i. With $m\left({y}_{i};\theta \right),i=1,\cdots ,n$ not only being independent but they are also identically distributed then we have the equivalence of the two methods. We also notice that $\stackrel{^}{S}$ used for GMM estimation plays a similar role as the working matrix ${V}_{i}\left(\theta \right)$ for GEE estimation but it is often simpler to obtain $\stackrel{^}{S}$ than ${V}_{i}\left(\theta \right)$. Often, more derivations are needed to obtain ${V}_{i}\left(\theta \right)$.

Based on expression (7) and the observation just made concerning expression (8), we shall define $m\left({y}_{i};\theta \right)$ for GMM slightly different than $m\left({y}_{i};\theta \right)$ used for generalized estimating functions (GEE) by letting for GMM estimation

${m}_{r}\left({y}_{i};\theta \right)={g}^{\left(r\right)}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({y}_{i};\theta \right)}\frac{{E}_{\theta}\left(\frac{\partial h\left({y}_{i};\theta \right)}{\partial {\theta}_{r}}\right)}{{E}_{\theta}\left({\left(h\left({y}_{i};\theta \right)\right)}^{2}\right)},r=1,\cdots ,p$

for the first p components of the vector $m\left({y}_{i};\theta \right)$ depending on the models being studied.

We might also want to consider including other ${m}_{r}\left({y}_{i};\theta \right)$ for $r>p$ depending on the model being studied for the sake of efficiency, i.e., this leads to define

$g\left(\theta \right)=(\begin{array}{c}{g}_{1}\left(\theta \right)\\ {g}_{2}(\; \theta \; )\; )\end{array}$

with ${g}_{1}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({y}_{i};\theta \right)}\frac{{E}_{\theta}\left(\frac{\partial h\left({y}_{i};\theta \right)}{\partial \theta}\right)}{{E}_{\theta}\left({\left(h\left({y}_{i};\theta \right)\right)}^{2}\right)}$ which is the vector of optimum

estimating function based on elements of the set $\left\{h\left({y}_{i};\theta \right),i=1,\cdots ,n\right\}$ and ${g}_{2}\left(\theta \right)$ with its components depend upon ${m}_{r}\left({y}_{i};\theta \right)$ for $r>p$ to be defined based on the model under investigation, and define the GMM objective function as

$Q\left(\theta \right)={g}^{\prime}\left(\theta \right){\stackrel{^}{S}}^{-1}g\left(\theta \right)$,

see Section 3 for more details for the choice of $g\left(\theta \right)=\left(\begin{array}{c}{g}_{1}\left(\theta \right)\\ {g}_{2}\left(\theta \right)\end{array}\right)$ for GMM methods with models based on probability generating functions.

One advantage of the GMM approach over generalized estimating equations (GEE) approach is with GMM approach, we have an objective function to be minimized and it leads to construction of chi-square tests for moment restrictions meanwhile there is no such equivalent test statistic if we use the generalized estimating equations approach. Furthermore, we shall see in Section 3 when applied to discrete distributions with moment conditions extracted from probability generating function, testing for moment restrictions can be viewed as testing goodness-of-fit for the count model being used. Consequently, estimation and model testing can be treated in a unified way using this approach.

As mentioned earlier, the GMM objective function evaluated at $\stackrel{^}{\theta}$ can be used to construct a test statistic which follows an asymptotic chi-square distribution for testing the null hypothesis which specify the validity of the vector moment conditions, i.e.,

${H}_{0}:{E}_{\theta}\left(m\left({y}_{i};\theta \right)\right)=\left(\begin{array}{c}{E}_{\theta}\left({m}_{1}\left({y}_{i};\theta \right)\right)\\ \vdots \\ {E}_{\theta}\left({m}_{k}\left({y}_{i};\theta \right)\right)\end{array}\right)=0$ for $i=1,\cdots $ (10)

but we need $k>p$, i.e., the number of sample moments must exceed the number of parameters to be estimated.

2.2. Testing the Validity of Moment Restrictions

We notice that since $g\left({\theta}_{0}\right)\stackrel{p}{\to}0$ and the vector of GMM estimators is consistent with $\stackrel{^}{\theta}\stackrel{p}{\to}{\theta}_{0}$ and in general $g\left(\stackrel{^}{\theta}\right)\stackrel{p}{\to}0$, the following statistics can be constructed and will have an asymptotic chi-square distributions. These statistics are also known as Hansen’s statistics after Hansen’s seminal works, see Hansen [13] and they can be used for testing the validity of moment restrictions.

For testing the simple hypothesis ${H}_{0}:{E}_{{\theta}_{0}}\left(m\left({y}_{i};{\theta}_{0}\right)\right)=\left(\begin{array}{c}{E}_{{\theta}_{0}}\left({m}_{1}\left({y}_{i};{\theta}_{0}\right)\right)\\ \vdots \\ {E}_{{\theta}_{0}}\left({m}_{k}\left({y}_{i};{\theta}_{0}\right)\right)\end{array}\right)=0$ for $i=1,\cdots ,k$ ; ${\theta}_{0}$ is specified, the Hansen’s statistic is given as

$nQ\left({\theta}_{0}\right)=n{g}^{\prime}\left({\theta}_{0}\right){\stackrel{^}{S}}^{-1}g(\; \theta \; 0\; )$

and the asymptotic distribution of the statistic is chi-square with k degree of freedom, i.e., $nQ\left({\theta}_{0}\right)\stackrel{L}{\to}{\chi}_{k}^{2}$ under ${H}_{0}$.

For testing the composite hypothesis

${H}_{0}:{E}_{\theta}\left(m\left({y}_{i};\theta \right)\right)=\left(\begin{array}{c}{E}_{\theta}\left({m}_{1}\left({y}_{i};\theta \right)\right)\\ \vdots \\ {E}_{\theta}\left({m}_{k}\left({y}_{i};\theta \right)\right)\end{array}\right)=0$ for $i=1,\cdots ,k$ ; $\theta \in \Omega $

We need to obtain $\stackrel{^}{\theta}$ first by minimizing $Q\left(\theta \right)$ then the Hansen’s statistic is given as

$nQ\left(\stackrel{^}{\theta}\right)=n{g}^{\prime}\left(\stackrel{^}{\theta}\right){\stackrel{^}{S}}^{-1}g(\; \theta \; ^\; )$

and the asymptotic distribution of the statistic is chi-square with $k-p$ degree of freedom, i.e., $nQ\left({\theta}_{0}\right)\stackrel{L}{\to}{\chi}_{k-p}^{2}$ under ${H}_{0}$, assuming $k>p$.

These statistics will be used subsequently with moment conditions extracted from the model probability generating function in Section 3. We shall show in the next sections that these statistics are consistent test statistics in general for model testing with the discrete model specified by its probability generating function. These statistics are also distribution free. The distribution free property is not enjoyed by goodness-of-fit test statistics for model testing based on the empirical probability function which is defined as

${P}_{n}\left(s\right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{s}^{{X}_{i}}}$ (11)

with ${X}_{1},\cdots ,{X}_{n}$ being independent and identically random variables from a discrete model specified by the model probability generating function ${P}_{\theta}\left(s\right)={E}_{\theta}\left({s}^{X}\right)$ which are given by Rueda and O’Reilly [14], Marcheselli et al. [15] as the null distributions of the statistics depend on the unknown parameters. In addition, the procedures as proposed by Doray et al. [5] only make use of k fixed points ${s}_{1},\cdots ,{s}_{k}$ to generate moment conditions regardless of the sample size n.

The procedures proposed in this paper are different as the number of points selected from the probability generating function goes to infinity as $n\to \infty $.

3. GEE and GMM Methods with Moment Conditions from Probability Generating Function

In this section, we shall give attention to count models and we shall assume that we have a random sample of n independent and identically distributed observations ${X}_{1},\cdots ,{X}_{n}$ which follow the same distribution as X and X follows a nonnegative integer discrete distribution with probability mass function $p\left(x;\theta \right)$ with no closed form but with model probability generating function ${P}_{\theta}\left(s\right)={E}_{\theta}\left({s}^{X}\right)$ with closed form and relatively simple to handle, ${P}_{\theta}\left(s\right)$ is well defined on the domain of $s\in \left[-1,1\right]$.

It is well known that in general, the probability mass function is uniquely characterized by its corresponding probability generating function. Subsequently, two versions of GMM objective functions will be introduced based on estimating function theory. The first version is based on using points of $s\in \left[0,1\right]$ to form moment conditions which are commonly used in the literature and it is given in Section 3.2.1 and Section 3.2.2, the second version is based on $s\in \left[-1,1\right]$ and it is given in Section 3.2. 3.

Optimum estimating functions can be used to obtain estimators but we emphasize here the GMM approach as tests for moment restrictions with asymptotic chi-square distribution free can also be obtained which can be interpreted as goodness-of-fit tests for the parametric family used. However, optimum estimating functions theory is very useful for identifying sample moments for efficiency of GMM procedures.

3.1. Generalized Estimating Functions (GEE)

First, we shall define the basic unbiased estimating functions $\left\{h\left({x}_{i};\theta \right)\right\}$, i.e., with the property ${E}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)\right)=0$, then we shall form the optimum estimating functions based on linear combination of these elementary estimating functions. Since the basic elementary estimating functions are unbiased estimating function, the optimum estimating functions will be unbiased.

For each observation ${X}_{i}$, we shall associate the value

${s}_{i}=\frac{i-1/2}{n}$ for $i=1,\cdots ,n$.

As $n\to \infty $, the set $\left\{{s}_{i};i=1,\cdots ,n\right\}$ will become dense in $\left[0,1\right]$ and define the elementary estimating functions as

$h\left({x}_{i};\theta \right)=h\left({x}_{i};{s}_{i},\theta \right)={s}_{i}^{{X}_{i}}-{P}_{\theta}\left({s}_{i}\right);i=1,\cdots ,n$

and clearly ${E}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)\right)=0$.

Since $h\left({x}_{i};{s}_{i},\theta \right)$ is independent of $h\left({x}_{j};{s}_{j},\theta \right)$ for $i\ne j$, we have the property

${E}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)h\left({x}_{j};{s}_{j},\theta \right)\right)=0$, for $i\ne j$. (12)

The elements of the set $\left\{h\left({x}_{i};\theta \right),i=1,\cdots ,n\right\}$ are said to be mutually orthogonal if elements of the set have the property as defined by expression (12), see Godambe and Thompson [10] (page 139). Therefore, using Godambe and Thompson [10] (page 139) optimality criteria the optimum estimating functions for estimating ${\theta}_{0}$ based on linear combination of the basic estimating functions which are orthogonal are given by

${g}^{\left(r\right)}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({x}_{i};{s}_{i},\theta \right)}\frac{{E}_{\theta}\left(\frac{\partial h\left({x}_{i};{s}_{i},\theta \right)}{\partial {\theta}_{r}}\right)}{{E}_{\theta}\left({\left(\left\{h\left({x}_{i};{s}_{i},\theta \right)\right\}\right)}^{2}\right)},$ (13)

and clearly ${E}_{\theta}\left({g}^{\left(r\right)}\left(\theta \right)\right)=0,r=1,\cdots ,p$, the optimum estimating functions are also unbiased.

We define the vector

${\beta}_{i}\left(\theta \right)=\frac{{E}_{\theta}\left(\frac{\partial h\left({x}_{i};{s}_{i},\theta \right)}{\partial \theta}\right)}{{E}_{\theta}\left({\left(\left\{h\left({x}_{i};{s}_{i},\theta \right)\right\}\right)}^{2}\right)}$.

Since ${E}_{\theta}\left(\frac{\partial h\left({x}_{i};{s}_{i},\theta \right)}{\partial {\theta}_{r}}\right)=-\frac{\partial {P}_{\theta}\left({s}_{i}\right)}{\partial {\theta}_{r}}$ and letting ${v}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)\right)$ be the variance of $h\left({x}_{i};{s}_{i},\theta \right)$, so

${E}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)h\left({x}_{i};{s}_{i},\theta \right)\right)={v}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)\right)$

and

${v}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)\right)={E}_{\theta}\left({s}_{i}^{{X}_{i}}{s}_{i}^{{X}_{i}}\right)-{\left({E}_{\theta}\left({s}_{i}^{{X}_{i}}\right)\right)}^{2}={P}_{\theta}\left({s}_{i}^{2}\right)-{\left({P}_{\theta}\left({s}_{i}\right)\right)}^{2}.$

This implies

${\beta}_{i}\left(\theta \right)=\frac{{E}_{\theta}\left(\frac{\partial h\left({x}_{i};{s}_{i},\theta \right)}{\partial \theta}\right)}{{E}_{\theta}\left({\left(\left\{h\left({x}_{i};{s}_{i},\theta \right)\right\}\right)}^{2}\right)}=\frac{-\frac{\partial {P}_{\theta}\left({s}_{i}\right)}{\partial \theta}}{{P}_{\theta}\left({s}_{i}^{2}\right)-{\left({P}_{\theta}\left({s}_{i}\right)\right)}^{2}}$. (15)

Therefore, equivalently the vector of optimum estimating function is given by

$\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({x}_{i};{s}_{i},\theta \right){\beta}_{i}\left(\theta \right)}$. (16)

For GEE estimation as given by expression (4) and expression (5), we need to specify the vector $m\left({x}_{i};\theta \right)$. Let us partition $m\left({x}_{i};\theta \right)$ into two components with

$m\left({x}_{i};\theta \right)=\left(\begin{array}{c}{m}_{1}\left({x}_{i};\theta \right)\\ {m}_{2}\left({x}_{i};\theta \right)\end{array}\right)$, ${m}_{1}\left({x}_{i};\theta \right)={s}_{i}^{{X}_{i}}-{P}_{\theta}\left({s}_{i}\right),i=1,\cdots ,n$. (17)

We select two points ${t}_{1}$ and ${t}_{2}$, for example by letting ${t}_{1}=0.50$ and ${t}_{2}=0.75$ and therefore we can form two sets of elementary basic unbiased estimating functions using these two points which are given by

$\left\{{t}_{1}^{{X}_{i}}-{P}_{\theta}\left({t}_{1}\right),i=1,\cdots ,n\right\}$ and $\left\{{t}_{2}^{{X}_{i}}-{P}_{\theta}\left({t}_{2}\right),i=1,\cdots ,n\right\}$. (18)

These two sets of elementary unbiased estimating function are selected because as we shall see when used to form moment conditions for the GMM objective function, they allow the construction of consistent chi-square tests.

Furthermore, with the probability generating function we can derive the expectation of X which is denoted by $\mu \left(\theta \right)={E}_{\theta}\left(X\right)$ and another set of elementary unbiased estimating function can be created which is given by $\left\{{X}_{i}-\mu \left(\theta \right),i=1,\cdots ,n\right\}$ and since the sample mean if incorporated into estimating equations in general might help to improve the efficiency of the estimators, this set of estimating functions are also being considered and used for forming the vector of generalized estimating functions. By making use of these three sets of elementary unbiased estimating functions lead us to define

${m}_{2}\left({x}_{i};\theta \right)=\left(\begin{array}{c}{t}_{1}^{{X}_{i}}-{P}_{\theta}\left({t}_{1}\right)\\ {t}_{2}^{{X}_{i}}-{P}_{\theta}\left({t}_{2}\right)\\ {X}_{i}-\mu \left(\theta \right)\end{array}\right),i=1,\cdots ,n$.

provided for the model $\mu \left(\theta \right)$ exists and note that $\mu \left(\theta \right)$ can be obtained from the derivative of the probability generating function, in fact $\mu \left(\theta \right)={{P}^{\prime}}_{\theta}\left(1\right)$, ${{P}^{\prime}}_{\theta}\left(s\right)=\frac{\text{d}{P}_{\theta}\left(t\right)}{\text{d}t}$.

If $\mu \left(\theta \right)$ does not exist, the last component of ${m}_{2}\left({x}_{i};\theta \right)$ is replaced by ${X}_{i}{t}_{3}^{{X}_{i}-1}-{{P}^{\prime}}_{\theta}\left({t}_{3}\right)$ with ${t}_{3}$ close to 1, say ${t}_{3}=0.95$ for example and see section 4 for an illustration and for finding the working matrix ${V}_{i}\left(\theta \right)$. For estimators to have a multivariate asymptotic normal distribution, we also need the existence of the common variance of ${X}_{i},i=1,\cdots ,n$ under the model.

Having specified the vector

$m\left({x}_{i};\theta \right)=\left(\begin{array}{c}{m}_{1}\left({x}_{i};\theta \right)\\ {m}_{2}\left({x}_{i};\theta \right)\end{array}\right)$,

GEE estimation can be performed using results and procedures of Section 2.1, the vector of GEE estimators ${\stackrel{^}{\theta}}_{op}$ is obtained by solving the system of equations as given by expression (5). Observe that with the notations being introduced, ${m}_{1}\left({x}_{i};\theta \right)$ denote a function which also depends on ${s}_{i}$, i.e., ${m}_{1}\left({x}_{i};\theta \right)={m}_{1}\left({x}_{i};{s}_{i},\theta \right)$ and clearly ${m}_{1}\left({x}_{i};\theta \right),i=1,\cdots ,n$ are not identically distributed but ${m}_{2}\left({x}_{i};\theta \right),i=1,\cdots ,n$ are identically distributed vectors of random variables. Therefore, GEE estimators are no longer asymptotically equivalent to GMM estimators using the same vectors $m\left({x}_{i};\theta \right)$. With the notations being used, GEE estimators and GMM estimators are asymptotically equivalent only if $m\left({x}_{i};\theta \right),i=1,\cdots ,n$ have a common multivariate distribution.

3.2. GMM Methodology

Before defining the sample moment vector $m\left({y}_{i};\theta \right)$ for GMM methods, let us for the time being turn our attention on how to obtain a preliminary consistent estimate ${\stackrel{^}{\theta}}^{\left(0\right)}$ in general, Such a preliminary estimate ${\stackrel{^}{\theta}}^{\left(0\right)}$ is needed for numerical algorithms to implement GMM procedures and to define the matrix ${\stackrel{^}{S}}^{-1}$ which is used to define the GMM objective function. The nonlinear least-squares (NLS) estimators can be used to obtain a preliminary consistent estimate ${\stackrel{^}{\theta}}^{\left(0\right)}$ with ${\stackrel{^}{\theta}}^{\left(0\right)}$ being the vector which minimizes

$\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{\left(h\left({x}_{i};{s}_{i},\theta \right)\right)}^{2}}$.

Note that the estimating functions of the nonlinear least-squares methods are

$\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({x}_{i};{s}_{i},\theta \right)}\frac{\partial h\left({x}_{i};{s}_{i},\theta \right)}{\partial {\theta}_{r}},r=1,\cdots ,p$,

and they have some resemblance to the optimum ones as they are also based on linear combinations of $h\left({x}_{j};{s}_{j},\theta \right),j=1,\cdots ,n$ but they are not optimum.

3.2.1. GMM Objective Function

Now we turn our attention to defining the vector

$m\left({x}_{i};\theta \right)=\left(\begin{array}{c}{m}_{1}\left({x}_{i};\theta \right)\\ {m}_{2}\left({x}_{i};\theta \right)\end{array}\right)$.

we have seen GMM estimators are no longer equivalent to GEE estimators if we define $m\left({x}_{i};\theta \right)$ as for GEE methods, some modifications appear to be necessary and to ensure that GMM estimators have comparable efficiencies to the ones obtained by using optimum estimating functions based on $\left\{h\left({x}_{i};\theta \right),i=1,\cdots ,n\right\}$, we shall let

${m}_{1}\left({x}_{i};\theta \right)=h\left({x}_{i};{s}_{i},\theta \right){\beta}_{i}\left(\theta \right),i=1,\cdots ,n$

with the corresponding sample moment, ${g}_{1}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({x}_{i};{s}_{i},\theta \right){\beta}_{i}\left(\theta \right)}$, ${g}_{1}\left(\theta \right)$ is the vector of optimum function based on

$\left\{h\left({x}_{i};{s}_{i},\theta \right),i=1,\cdots ,n\right\}$

and keeping ${m}_{2}\left({x}_{i};\theta \right),i=1,\cdots ,n$ as for GEE estimation, so the corresponding sample moment vector for GMM estimation is

$g\left(\theta \right)=\left(\begin{array}{c}{g}_{1}\left(\theta \right)\\ {g}_{2}\left(\theta \right)\end{array}\right)$ (19)

with ${g}_{1}\left(\theta \right)$ being just defined and

${g}_{2}\left(\theta \right)=\frac{1}{n}\left(\begin{array}{c}{\displaystyle {\sum}_{i=1}^{n}\left({t}_{1}^{{X}_{i}}-{P}_{\theta}\left({t}_{1}\right)\right)}\\ {\displaystyle {\sum}_{i=1}^{n}\left({t}_{2}^{{X}_{i}}-{P}_{\theta}\left({t}_{2}\right)\right)}\\ {\displaystyle {\sum}_{i=1}^{n}\left({X}_{i}-\mu \left(\theta \right)\right)}\end{array}\right)$

if $\mu \left(\theta \right)$ exists, otherwise let ${g}_{2}\left(\theta \right)=\frac{1}{n}\left(\begin{array}{c}{\displaystyle {\sum}_{i=1}^{n}\left({t}_{1}^{{X}_{i}}-{P}_{\theta}\left({t}_{1}\right)\right)}\\ {\displaystyle {\sum}_{i=1}^{n}\left({t}_{2}^{{X}_{i}}-{P}_{\theta}\left({t}_{2}\right)\right)}\\ {X}_{i}{t}_{3}^{{X}_{i}-1}-{{P}^{\prime}}_{\theta}\left({t}_{3}\right)\end{array}\right)$, ${t}_{3}$ is chosen to close to 1 but ${t}_{3}<1$. The GMM objective function can be constructed and given by

$Q\left(\theta \right)={g}^{\prime}\left(\theta \right){\stackrel{^}{S}}^{-1}g\left(\theta \right)$.

3.2.2. Model Testing Using GMM Objective Function

Now we shall turn our attention to the problem of testing a model specified by its probability generating function. Let ${X}_{1},\cdots ,{X}_{n}$ be the random sample drawn from the nonnegative integer discrete distribution with probability generating function ${P}_{0}\left(t\right)$ and we want to test the following simple null hypothesis which specifies ${P}_{0}\left(t\right)={P}_{{\theta}_{0}}\left(t\right)$, ${\theta}_{0}$ is specified, i.e.

${H}_{0}:{P}_{0}\left(t\right)={P}_{{\theta}_{0}}(\; t\; )$

and clearly if ${H}_{0}:{P}_{0}\left(t\right)={P}_{{\theta}_{0}}\left(t\right)$ is true we have ${E}_{{\theta}_{0}}\left(g\left({\theta}_{0}\right)\right)=0$.

The following chi-square statistics

$nQ\left({\theta}_{0}\right)\stackrel{L}{\to}{\chi}_{r}^{2}$ with $r=k$

For practical applications, the chi-square tests are in general consistent to detect common departure that we are interested as we shall see that if ${P}_{0}\left(t\right)\ne {P}_{{\theta}_{0}}\left(t\right)$, the test will allow us to reject ${H}_{0}:{P}_{0}\left(t\right)={P}_{{\theta}_{0}}\left(t\right)$ in general as $n\to \infty $. Indeed, we have this property via the chi-square statistic, because if ${P}_{0}\left(t\right)\ne {P}_{{\theta}_{0}}\left(t\right)$ the chi-square statistic will converge to infinity.

In order not to have this property, we must have

${P}_{0}\left(t\right)\ne {P}_{{\theta}_{0}}\left(t\right)$ but $g\left({\theta}_{0}\right)\stackrel{p}{\to}0$.

If $g\left({\theta}_{0}\right)\stackrel{p}{\to}0$ then two of its components given by

$\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({t}_{1}^{{X}_{i}}-{P}_{{\theta}_{0}}\left({t}_{1}\right)\right)},\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({t}_{2}^{{X}_{i}}-{P}_{{\theta}_{0}}\left({t}_{2}\right)\right)}$ must simultaneously converge to 0 in probability, i.e.,

$\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({t}_{1}^{{X}_{i}}-{P}_{{\theta}_{0}}\left({t}_{1}\right)\right)}\stackrel{p}{\to}0$ and $\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({t}_{2}^{{X}_{i}}-{P}_{{\theta}_{0}}\left({t}_{2}\right)\right)}\stackrel{p}{\to}0$. (20)

We shall show that in general for ${P}_{0}\left(t\right)$ encountered for applications it cannot happen.

Suppose that

$\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({t}_{1}^{{X}_{i}}-{P}_{{\theta}_{0}}\left({t}_{1}\right)\right)}\stackrel{p}{\to}0$, this implies ${P}_{0}\left({t}_{1}\right)={P}_{{\theta}_{0}}(\; t\; 1\; )$

and similarly

$\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({t}_{2}^{{X}_{i}}-{P}_{{\theta}_{0}}\left({t}_{2}\right)\right)}\stackrel{p}{\to}0$, this implies ${P}_{0}\left({t}_{2}\right)={P}_{{\theta}_{0}}\left({t}_{2}\right)$.

Observe that in general for probability generating function $P\left(s\right)$ used for applications, the function $P\left(s\right)$ is convex for $0<s<1$ and $P\left(t\right)=1$ when $t=1$, i.e., $P\left(1\right)=1$, see Resnick [16] (p 22-23).

Furthermore, for ${P}_{0}\left(t\right)\ne {P}_{{\theta}_{0}}\left(t\right)$ encountered in applications, we also have in general ${P}_{0}\left(t\right)\ne {P}_{{\theta}_{0}}\left(t\right)$ for some $t\in \left(0,1\right)$. This also means in general, there is only one point a with $0<a<1$ at most where ${P}_{0}\left(t\right)$ crosses ${P}_{{\theta}_{0}}\left(t\right)$ since ${P}_{0}\left(t\right)$ and ${P}_{{\theta}_{0}}\left(t\right)$ are both strictly convex functions and ${P}_{0}\left(1\right)={P}_{{\theta}_{0}}\left(1\right)=1$. Therefore, we cannot have simultaneously convergence as given by expression (19) and the chi-square test is consistent in general as it can detect common departure from ${H}_{0}:{P}_{0}\left(t\right)={P}_{{\theta}_{0}}\left(t\right)$ as $n\to \infty $.

For testing the composite ${H}_{0}:{P}_{0}\left(t\right)\in \left\{{P}_{\theta}\left(t\right)\right\}$, we need to estimate ${\theta}_{0}$ by $\stackrel{^}{\theta}$ by minimizing $Q\left(\theta \right)={g}^{\prime}\left(\theta \right){\stackrel{^}{S}}^{-1}g\left(\theta \right)$ first and subsequently use $\stackrel{^}{\theta}$ to compute the following chi-square statistic $nQ\left(\stackrel{^}{\theta}\right)$ and $nQ\left(\stackrel{^}{\theta}\right)\stackrel{L}{\to}{\chi}_{r}^{2}$ with $r=3$.

These chi-square statistics are distribution free as there is no unknown parameter in these chi-square distributions for the statistics used. These goodness-of-fit tests are simpler to implement than the ones based on matching sample probability generating function with its model counterpart using a continuum of moment conditions as given by Theorem 10 of Carrasco and Florens [6] (p 812-813). Note that maximum likelihood estimators if used concomitantly with the common classical Pearson statistics often have complicated distributions and the statistics are no longer distribution free, see Chernoff and Lehmann [17], Luong and Thompson [18] and these classical Pearson’s test statistics are not consistent in general.

3.2.3. Further Extensions: The Use of Orthogonal Estimating Functions

Notice that beside the set of basic estimating functions

$\left\{h\left({x}_{i};{s}_{i}\right)={s}^{{X}_{i}}-{P}_{\theta}\left({s}_{i}\right),i=1,\cdots ,n\right\}$

as defined earlier we also have another set of basic estimating functions given by $\left\{l\left({x}_{i};{s}_{i},\theta \right),i=1,\cdots ,n\right\}$ with $l\left({x}_{i};{s}_{i},\theta \right)={\left(-{s}_{i}\right)}^{{X}_{i}}-{P}_{\theta}\left(-{s}_{i}\right)$.

Consequently, if in addition of the first set of estimating functions, we also want to incorporate the second set of basic estimating functions for building $g\left(\theta \right)$ then we can use optimum orthogonal estimating functions and instead of the first p components of the vector $g\left(\theta \right)$ are given by the vector

$\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({x}_{i};{s}_{i},\theta \right){\beta}_{i}(\; \theta \; )}$

which is the vector optimum estimating functions based on the set of basic estimating functions $\left\{h\left({x}_{i};{s}_{i},\theta \right),i=1,\cdots ,n\right\}$, we shall use a more general vector of optimum estimating functions which can incorporate a larger set of basic estimating functions as described below.

Observe that we also have another set of estimating functions given by $\left\{l\left({x}_{i};{s}_{i},\theta \right),i=1,\cdots ,n\right\}$ with $l\left({x}_{i};{s}_{i},\theta \right)={\left(-{s}_{i}\right)}^{{X}_{i}}-{P}_{\theta}\left({s}_{i}\right)$ and clearly $\left\{l\left({x}_{i};{s}_{i},\theta \right),i=1,\cdots ,n\right\}$ form a mutually orthogonal basic estimating function but together combining the two sets of basic estimating functions to form the set

$\left\{h\left({x}_{i};{s}_{i},\theta \right),l\left({x}_{i};{s}_{i},\theta \right),i=1,\cdots ,n\right\}$,

the basic estimating functions of the combined set are not mutually orthogonal because ${E}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)l\left({x}_{i};{s}_{i},\theta \right)\right)$ is not equal to 0. Using Gram-Schmidt orthogonalizing procedure we can replace $l\left({x}_{i};{s}_{i},\theta \right)$ by

${l}^{0}\left({x}_{i};{s}_{i},\theta \right)=l\left({x}_{i};{s}_{i},\theta \right)-{\alpha}_{i}\left(\theta \right)h\left({x}_{i};{s}_{i},\theta \right),i=1,\cdots ,n$,

${\alpha}_{i}\left(\theta \right)=\frac{{E}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)l\left({x}_{i};{s}_{i},\theta \right)\right)}{{E}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)h\left({x}_{i};{s}_{i},\theta \right)\right)}$

which can also be represented as

${\alpha}_{i}\left(\theta \right)=\frac{{P}_{\theta}\left(-{s}_{i}^{2}\right)-{P}_{\theta}\left({s}_{i}\right){P}_{\theta}\left(-{s}_{i}\right)}{{P}_{\theta}\left({s}_{i}^{2}\right)-{\left({P}_{\theta}\left({s}_{i}\right)\right)}^{2}}$ (21)

Since

$\begin{array}{c}{E}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)l\left({x}_{i};{s}_{i},\theta \right)\right)={E}_{\theta}\left({s}_{i}^{{X}_{i}}{\left(-{s}_{i}\right)}^{{X}_{i}}\right)-{E}_{\theta}\left({s}_{i}^{{X}_{i}}\right){E}_{\theta}\left({\left(-{s}_{i}\right)}^{{X}_{i}}\right)\\ ={P}_{\theta}\left(-{s}_{i}^{2}\right)-{P}_{\theta}\left({s}_{i}\right){P}_{\theta}\left(-{s}_{i}\right)\end{array}$

and ${E}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)h\left({x}_{i};{s}_{i},\theta \right)\right)$ is simply the variance ${v}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)\right)$ of $h\left({x}_{i};{s}_{i},\theta \right)$ since the basic estimating functions are unbiased,

${v}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)\right)={P}_{\theta}\left({s}_{i}^{2}\right)-{\left({P}_{\theta}\left({s}_{i}\right)\right)}^{2}$.

Now, it is easy to see that that set

$\left\{h\left({x}_{i};{s}_{i},\theta \right),{l}^{0}\left({x}_{i};{s}_{i},\theta \right),i=1,\cdots ,n\right\}$

is a set of mutually orthogonal of basic or elementary estimating functions, see Definition 2.2 and Theorem 2.1 as given by Godambe and Thompson [10] (p 139-140).

Li and Turtle [19] (p 177) also use a similar orthogonalization procedure and Theorem 2.1 for creating optimum estimating functions for ARCH model.

The first p components of the vector of sample moment functions $g\left(\theta \right)$ are simply the optimum estimating functions based on linear combinations of basic estimating functions of the set $\left\{h\left({x}_{i};{s}_{i},\theta \right),{l}^{0}\left({x}_{i};{s}_{i},\theta \right),i=1,\cdots ,n\right\}$ and again using Theorem 2.1 by Godambe and Thompson [10], the vector of optimum estimating functions is given by

$\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left(h\left({x}_{i};{s}_{i},\theta \right){\beta}_{i}\left(\theta \right)+{l}^{0}\left({x}_{i};{s}_{i},\theta \right){\gamma}_{i}\left(\theta \right)\right)}$ (22)

with

${\gamma}_{i}\left(\theta \right)=\frac{{E}_{\theta}\left(\frac{\partial {l}^{0}\left({x}_{i};{s}_{i},\theta \right)}{\partial \theta}\right)}{{E}_{\theta}\left({\left({l}^{0}\left({x}_{i};{s}_{i},\theta \right)\right)}^{2}\right)}$

and ${\beta}_{i}\left(\theta \right)$ is as defined by expression (15).

Now we shall display the expression for ${\gamma}_{i}\left(\theta \right)$, first note that

${E}_{\theta}\left(\frac{\partial {l}^{0}\left({x}_{i};{s}_{i},\theta \right)}{\partial \theta}\right)=-\frac{\partial {P}_{\theta}\left(-{s}_{i}\right)}{\partial \theta}+{\alpha}_{i}\left(\theta \right)\frac{\partial {P}_{\theta}\left({s}_{i}\right)}{\partial \theta}$

and ${E}_{\theta}\left({\left({l}^{0}\left({x}_{i};{s}_{i},\theta \right)\right)}^{2}\right)={v}_{\theta}\left({l}^{0}\left({x}_{i};{s}_{i},\theta \right)\right)$, so that

$\begin{array}{c}{v}_{\theta}\left({l}^{0}\left({x}_{i};{s}_{i},\theta \right)\right)={v}_{\theta}\left(l\left({x}_{i};{s}_{i},\theta \right)\right)+{\alpha}_{i}^{2}\left(\theta \right){v}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right)\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}-2{\alpha}_{i}\left(\theta \right)co{v}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right),l\left({x}_{i};{s}_{i},\theta \right)\right)\end{array}$

with the variance

$\begin{array}{c}{v}_{\theta}\left(l\left({x}_{i};{s}_{i},\theta \right)\right)={E}_{\theta}\left({\left(-{s}_{i}\right)}^{{X}_{i}}{\left(-{s}_{i}\right)}^{{X}_{i}}\right)-{E}_{\theta}\left({\left(-{s}_{i}\right)}^{{X}_{i}}\right){E}_{\theta}\left({\left(-{s}_{i}\right)}^{{X}_{i}}\right)\\ ={P}_{\theta}\left({s}_{i}^{2}\right)-{\left({P}_{\theta}\left(-{s}_{i}\right)\right)}^{2}\end{array}$

${\alpha}_{i}\left(\theta \right)$ is as given by expression (21) and the covariance

$co{v}_{\theta}\left(h\left({x}_{i};{s}_{i},\theta \right),l\left({x}_{i};{s}_{i},\theta \right)\right)={P}_{\theta}\left(-{s}_{i}^{2}\right)-{P}_{\theta}\left({s}_{i}\right){P}_{\theta}\left(-{s}_{i}\right)$

The expression for ${\gamma}_{i}\left(\theta \right)$ can be displayed fully and it is given by

$\frac{-\frac{\partial {P}_{\theta}\left(-{s}_{i}\right)}{\partial \theta}+{\alpha}_{i}\left(\theta \right)\frac{\partial {P}_{\theta}\left({s}_{i}\right)}{\partial \theta}}{{P}_{\theta}\left({s}_{i}^{2}\right)-{\left({P}_{\theta}\left(-{s}_{i}\right)\right)}^{2}+{\alpha}_{i}^{2}\left(\theta \right)\left({P}_{\theta}\left({s}_{i}^{2}\right)-{\left({P}_{\theta}\left({s}_{i}\right)\right)}^{2}\right)-2{\alpha}_{i}\left(\theta \right)\left({P}_{\theta}\left(-{s}_{i}^{2}\right)-{P}_{\theta}\left({s}_{i}\right){P}_{\theta}\left(-{s}_{i}\right)\right)}$.

with this vector of optimum estimating functions, the sample moments function $g\left(\theta \right)$ for forming the corresponding GMM objective function can be defined and given below.

Let

${g}_{1}\left(\theta \right)=\left(\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left(h\left({x}_{i};{s}_{i},\theta \right){\beta}_{i}\left(\theta \right)+{l}^{0}\left({x}_{i};{s}_{i},\theta \right){\gamma}_{i}\left(\theta \right)\right)}\right)$

and keeping ${g}_{2}\left(\theta \right)$ as the component vector of $g\left(\theta \right)$ as specified by expression (19), so $g\left(\theta \right)=\left(\begin{array}{c}{g}_{1}\left(\theta \right)\\ {g}_{2}\left(\theta \right)\end{array}\right)$ and the choice of $g\left(\theta \right)=\left(\begin{array}{c}{g}_{1}\left(\theta \right)\\ {g}_{2}\left(\theta \right)\end{array}\right)$ with the use

of optimum orthogonal estimating functions constructed using two set of basic estimating functions for ${g}_{1}\left(\theta \right)$ is to be preferred for improving the efficiency

for estimation for some models if $g\left(\theta \right)=\left(\begin{array}{c}{g}_{1}\left(\theta \right)\\ {g}_{2}\left(\theta \right)\end{array}\right)$ as defined by expression

(19) in Section 3.1.1 does not give satisfactory results for efficiency for GMM estimation. Model testing procedures using this GMM objective function are identical to procedures for GMM objective function used earlier.

We might also want to enlarge the vector of ${g}_{2}\left(\theta \right)$ by adding more components but more components also tend to create numerical difficulties because the matrix $\stackrel{^}{S}$ will be nearly singular and the numerical inversion of such a matrix is often problematic.

Finally, we note that the GMM methods developed although are primarily for discrete distributions, the methods can also accommodate nonnegative continuous defined using Laplace transforms as discussed in Luong [20] as Laplace transforms are related to probability generating functions.

4. An Example and Numerical Illustrations

We shall use an example to illustrate the procedures, let us consider a random sample of observations ${X}_{1},\cdots ,{X}_{n}$ is drawn from the Poisson distribution with probability generating function ${P}_{\theta}\left(s\right)={\text{e}}^{\theta \left(s-1\right)}$, $\theta >0$. For this model $\theta $ is scalar. We would like to use GMM methods here as despite that maximum likelihood estimator for $\theta $ is available and given by ${\stackrel{^}{\theta}}_{ML}=\stackrel{\xaf}{X}$, using ${\stackrel{^}{\theta}}_{ML}$ does not lead to tractable distribution free goodness of fit test statistics with the use of Pearson type statistics as mentioned earlier.

For this model, the coefficient

${\beta}_{i}\left(\theta \right)=\frac{{E}_{\theta}\left(\frac{\partial h\left({x}_{i};{s}_{i},\theta \right)}{\partial \theta}\right)}{{E}_{\theta}\left({\left(\left\{h\left({x}_{i};{s}_{i},\theta \right)\right\}\right)}^{2}\right)}=\frac{\left(1-{s}_{i}\right){\text{e}}^{\theta \left({s}_{i}-1\right)}}{{\text{e}}^{\theta \left({s}_{i}^{2}-1\right)}-{\text{e}}^{2\theta \left({s}_{i}-1\right)}}$,

$h\left({x}_{i};{s}_{i}\right)={s}^{{X}_{i}}-{P}_{\theta}\left({s}_{i}\right),i=1,\cdots ,n$.

We consider the case with the sample moment vector given by

$g\left(\theta \right)=\left(\begin{array}{c}{g}_{1}\left(\theta \right)\\ {g}_{2}\left(\theta \right)\end{array}\right)$, ${g}_{1}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({s}_{i}-{\text{e}}^{\theta \left({s}_{i}-1\right)}\right){\beta}_{i}\left(\theta \right)}$, ${s}_{i}=\frac{i-1/2}{n}$

${g}_{2}\left(\theta \right)=\left(\begin{array}{c}\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({t}_{1}^{{X}_{i}}-{P}_{\theta}\left({t}_{1}\right)\right)}\\ \frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({t}_{2}^{{X}_{i}}-{P}_{\theta}\left({t}_{2}\right)\right)}\\ \frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({X}_{i}-\mu \left(\theta \right)\right)}\end{array}\right),\mu \left(\theta \right)=\theta ,{t}_{1}=0.5,{t}_{2}=0.75$

The vector

$m\left({x}_{i};{s}_{i},\theta \right)=\left(\begin{array}{c}{m}_{1}\left({x}_{i};{s}_{i},\theta \right)\\ \vdots \\ {m}_{4}\left({x}_{i};{s}_{i},\theta \right)\end{array}\right)$

will have four components with the components given respectively by

${m}_{1}\left({x}_{i};{s}_{i},\theta \right)=\left({s}_{i}^{{X}_{i}}-{\text{e}}^{\theta \left({s}_{i}-1\right)}\right){\beta}_{i}\left(\theta \right),$

${m}_{2}\left({x}_{i};{s}_{i},\theta \right)=\left({t}_{1}^{{X}_{i}}-{P}_{\theta}\left({t}_{1}\right)\right),$

${m}_{3}\left({x}_{i};{s}_{i},\theta \right)=\left({t}_{2}^{{X}_{i}}-{P}_{\theta}\left({t}_{2}\right)\right),$

${m}_{4}\left({x}_{i};{s}_{i},\theta \right)=\left({X}_{i}-\mu \left(\theta \right)\right).$

We can use ${\stackrel{^}{\theta}}^{\left(0\right)}={\stackrel{^}{\theta}}_{ML}$ as ${\stackrel{^}{\theta}}_{ML}$ is simple to obtain here and can be used as a preliminary consistent estimate. Now we can let

$\stackrel{^}{S}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left(m\left({x}_{i};{s}_{i},{\stackrel{^}{\theta}}^{\left(0\right)}\right)\right){\left(m\left({x}_{i};{s}_{i},{\stackrel{^}{\theta}}^{\left(0\right)}\right)\right)}^{\prime}}$ (23)

or

$\stackrel{^}{S}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{E}_{{\stackrel{^}{\theta}}^{\left(0\right)}}\left(\left[m\left({y}_{i};{\stackrel{^}{\theta}}^{\left(0\right)}\right)\right]{\left[m\left({y}_{i};{\stackrel{^}{\theta}}^{\left(0\right)}\right)\right]}^{\prime}\right)}$. (24)

The elements of $\stackrel{^}{S}$ as given by expression (24) can be computed using only the probability generating function of the model since we have

${E}_{\theta}\left(X{t}^{X}\right)=t{{P}^{\prime}}_{\theta}\left(t\right)$ and ${E}_{\theta}\left({t}_{1}^{X}{t}_{2}^{X}\right)={P}_{\theta}\left({t}_{1}{t}_{2}\right)$,

${E}_{\theta}\left(X{t}^{X}{s}^{X}\right)=\left(t+s\right){{P}^{\prime}}_{\theta}\left(t+s\right)$,

the variance of X is

${v}_{\theta}\left(X\right)={{P}^{\u2033}}_{\theta}\left(1\right)+{{P}^{\prime}}_{\theta}\left(1\right)-{\left({{P}^{\prime}}_{\theta}\left(1\right)\right)}^{2}$, ${{P}^{\u2033}}_{\theta}\left(t\right)=\frac{{\text{d}}^{2}{P}_{\theta}\left(t\right)}{\text{d}{t}^{2}}$.

For the Poisson model,

${v}_{\theta}\left(X\right)=\theta $.

$\stackrel{^}{S}$ as given by expression (24) tends to be invertible with less numerical difficulties.

The GMM objective function is given by

$Q\left(\theta \right)={g}^{\prime}\left(\theta \right){\stackrel{^}{S}}^{-1}g\left(\theta \right)$,

minimizing it allows us to obtain the corresponding GMM estimators $\stackrel{^}{\theta}$. In order to obtain an estimated asymptotic variance for $\stackrel{^}{\theta}$, we can define

${\stackrel{^}{D}}^{\prime}=\left(\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{E}_{\theta}\left(\frac{\partial {m}_{1}\left({x}_{i};{s}_{i},\theta \right)}{\partial \theta}\right)},\cdots ,\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{E}_{\theta}\left(\frac{\partial {m}_{4}\left({x}_{i};{s}_{i},\theta \right)}{\partial \theta}\right)}\right)$

evaluated at $\theta =\stackrel{^}{\theta}$.

The asymptotic variance for $\stackrel{^}{\theta}$ can be estimated as $\frac{1}{n}{\stackrel{^}{D}}^{\prime}{\stackrel{^}{S}}^{-1}\stackrel{^}{D}$ and the chi-

square statistic for testing the composite hypothesis ${H}_{0}:{P}_{0}\left(s\right)\in \left\{{P}_{\theta}\left(s\right)\right\}$ is given by $nQ\left(\stackrel{^}{\theta}\right)$ and $nQ\left(\stackrel{^}{\theta}\right)\stackrel{L}{\to}{\chi}_{3}^{2}$.

For testing the feasibility of GMM methods with this example, limited simulation studies are conducted. GMM methods can be implemented without numerical difficulties for $\theta \le 10$.

For values of $\theta $ with $10<\theta \le 100$, if expression (23) is used for $\stackrel{^}{S}$, the matrix $\stackrel{^}{S}$ tends to be nearly singular and the elements of $\stackrel{^}{S}$ need to be computed with higher accuracies in order to be able to invert $\stackrel{^}{S}$. We found that software like Maple or Mathematica is more able to compute with higher accuracy than R.

Often by using a spectral decomposition of $\stackrel{^}{S}$, we can obtain ${\stackrel{^}{S}}^{-1}$ numerically although directly asking for the inverse using R, it might just give the message, matrix is nearly singular and does not return the inverse. As can be seen by using the spectral representation of $\stackrel{^}{S}$,

$\stackrel{^}{S}={P}^{\prime}\Lambda P$

with ${P}^{\prime}$ being an orthonormal matrix, ${P}^{\prime}P=I$, ${P}^{\prime}={P}^{-1}$ and $\Lambda $ is a diagonal matrix with diagonal elements consist of eigenvalues of $\stackrel{^}{S}$ and these eigenvalues need to be computed with high accuracies and they also must be positive numerically, so by keeping more digits to compute the eigenvalues of $\stackrel{^}{S}$ then in general, ${\Lambda}^{-1}$ can be obtained and computed as

${\stackrel{^}{S}}^{-1}={P}^{\prime}{\Lambda}^{-1}P$.

if expression (24) is used instead of expression (23) for $\stackrel{^}{S}$ with software which keeps more accuracy on computing numbers then we encounter less numerical problems to invert $\stackrel{^}{S}$. For models which ${\stackrel{^}{S}}^{-1}$ is difficult to obtain, an empirical likelihood (EL) approach based on the same sample moments can be used and have the same efficiency as GMM methods but the numerical computations for implementing EL methods are also more involved, see Luong [20] on the use of penalty function for obtaining EL estimators.

We simulate $M=100$ samples of size $n=100$ from the Poisson distribution with $\theta =1,2,3,4,5,10,100$ and obtain respectively the GMM estimate, the NLS estimate and the ML estimate. The NLS estimate is the non linear least-squares estimate as mentioned in the beginning of Section 3.2.

For comparison of relative efficiencies of these methods we estimate the ratios

$\frac{\text{MSE}\left(\text{GMM}\right)}{\text{MSE}\left(\text{ML}\right)}$ and $\frac{\text{MSE}\left(\text{NLS}\right)}{\text{MSE}\left(\text{ML}\right)}$ where MSE(GMM), MSE(NLS), MSE(ML)

are respectively the estimates of mean square error of GMM estimator, NLS estimator and ML estimator using simulated samples. The efficiency of GMM estimator is practically identical to the efficiency of ML estimator but the efficiency on NLS estimator is much lower and getting worse as $\theta $ increases in comparison with ML estimator. The results are displayed in TableA1.

In order to test whether the chi-square test has power to detect departure from the model used here we use the negative binomial with mean equals to $\theta $

and variance equals to $\theta +\frac{{\theta}^{2}}{\alpha}$ as departure from the Poisson model with

$\alpha =1,2,3,4,5,10,100$ and simulate $M=100$ samples of size $n=100$ and the model used is Poisson with mean $\theta $. We can estimate the power of the tests at these alternative and results are displayed in TableA2. The level used for the chi-square tests is $\alpha =0.05$ with the critical point being the 0.95th percentile of a chi-square distribution with 3 degree of freedom, ${\chi}_{0.95}^{2}\left(3\right)=7.814$. The results obtained are also encouraging and show that the chi-square tests have considerable power to detect departures. As n becomes large the estimate power also decreases as expected since as $n\to \infty $, the negative binomial distribution also tends to the Poisson distribution. Larger scale simulation studies with more parametric families are needed to confirm the efficiencies of the proposed methods.

5. Conclusion

At this point, we can conclude that the methods appear to be relatively simple to implement and have the potentials to be efficient for some count models and have the advantage of only using of probability generating function instead of probability mass function, allowing inferences to be made for a much larger class of parametric families without relying on extensive use of simulations. The proposed GMM methodology also combines traditional GMM methodology with generalized estimating function methodology and both of these methodologies are well-known alternatives to ML methodology. There is a lack of statistics for model testing when using generalized estimating function methodology and it is overcome by the proposed procedures.

Acknowledgements

The helpful and constructive comments of a referee which lead to an improvement of the presentation of the paper and support from the editorial staff of Open Journal of Statistics to process the paper are all gratefully acknowledged.

Appendix

Table A1. Estimate relative efficiency comparisons between GMM, NLS and ML estimators.

M = 100 simulated samples are used and each with sample size n = 100.

Table A2. Estimate power of the chi-square tests using the Poisson model with parameter θ.

M = 100 simulated samples of size n = 100 for each sample are drawn from a negative binomial distribution with mean = 5 and variance = $\frac{{5}^{2}}{\alpha}$.

Conflicts of Interest

The authors declare no conflicts of interest.

[1] | Johnson, N.L., Kotz, S. and Kemp, A.W. (1992) Univariate Discrete Distributions. Second Edition, Wiley, New York. |

[2] | Klugman, S.A., Panjer, H.H. and Willmot, G.E. (2019) Loss Models: From Data to Decisions. Fifth Edition, Wiley, New York. |

[3] |
Christoph, G. and Schreiber, K. (1998) Discrete Stable Random Variables. Statistics and Probability Letters, 37, 243-247.
https://doi.org/10.1016/S0167-7152(97)00123-5 |

[4] |
Luong, A., Bilodeau, C. and Blier-Wong, C. (2018) Simulated Minimum Hellinger Distance Inference Methods for Count Data. Open Journal of Statistics, 8, 187-219.
https://doi.org/10.4236/ojs.2018.81012 |

[5] |
Doray, L.G., Jiang, S.M. and Luong, A. (2009) Some Simple Method of Estimation for the Parameters of the Discrete Stable Distribution with the Probability Generating Function. Communications in Statistics—Simulation and Computation, 38, 2004-2017. https://doi.org/10.1080/03610910903202089 |

[6] |
Carrasco, M. and Florens, J.-P. (2000) Generalization of GMM to a Continuum of Moment Conditions. Econometric Theory, 16, 797-834.
https://doi.org/10.1017/S0266466600166010 |

[7] |
Carrasco, M. and Kotchoni, R. (2017) Efficient Estimation Using Characteristic Function. Econometric Theory, 33, 479-526.
https://doi.org/10.1017/S0266466616000025 |

[8] | Martin, V., Hurn, S. and Harris, D. (2013) Econometric Modelling with Time Series: Specification, Estimation and Testing. Cambridge University Press, Cambridge. |

[9] | Hamilton, J.D. (1994) Time Series Analysis. Princeton University Press, Princeton. |

[10] |
Godambe, V.P. and Thompson, M.E. (1989) An Extension of Quasi-Likelihood Estimation. Journal of Statistical Planning and Inference, 22, 137-152.
https://doi.org/10.1016/0378-3758(89)90106-7 |

[11] |
Morton, R. (1981) Efficiency of Estimating Equations and the Use of Pivots. Biometrika, 68, 227-233. https://doi.org/10.1093/biomet/68.1.227 |

[12] |
Liang, K.Y. and Zeger, S.L. (1986) Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73, 13-22. https://doi.org/10.1093/biomet/73.1.13 |

[13] |
Hansen, L. (1982) Large Sample Properties of Generalized Method of Moment. Econometrica, 50, 1029-1054. https://doi.org/10.2307/1912775 |

[14] |
Rueda, R. and O’Reilly, F. (1999) Tests of Fit for Discrete Distributions Based on the Probability Generating Function. Communication in Statistics—Simulation and Computation, 28, 259-274. https://doi.org/10.1080/03610919908813547 |

[15] |
Marcheselli, M., Baccini, A. and Barabes, L. (2008) Parameter Estimation for the Discrete Stable Family. Communications in Statistics, 37, 815-830.
https://doi.org/10.1080/03610920701570298 |

[16] | Resnick, S. (1992) Adventures in Stochastic Processes. Birkhauser, Boston. |

[17] |
Chernoff, H. and Lehmann, E.L. (1954) The Use of Maximum Likelihood Estimates in Chi-Square Tests for Goodness of Fit. Annals of Mathematical Statistics, 25, 579-586. https://doi.org/10.1214/aoms/1177728726 |

[18] |
Luong, A. and Thompson, M.E. (1987) Minimum Distance Methods Based on Quadratic Distance for Transforms. Canadian Journal of Statistics, 15, 239-251.
https://doi.org/10.2307/3314914 |

[19] |
Li, D.X. and Turtle, H.J. (2000) Semi-Parametric ARCH Models: An Estimating Function Approach. Journal of Business and Economic Statistics, 18, 174-186.
https://doi.org/10.1080/07350015.2000.10524860 |

[20] |
Luong, A. (2017) Maximum Entropy Empirical Likelihood Methods Based on Laplace Transforms for Nonnegative Continuous Distributions with Actuarial Applications. Open Journal of Statistics, 7, 459-482.
https://doi.org/10.4236/ojs.2017.73033 |

Journals Menu

Contact us

customer@scirp.org | |

+86 18163351462(WhatsApp) | |

1655362766 | |

Paper Publishing WeChat |

Copyright © 2021 by authors and Scientific Research Publishing Inc.

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.