used in the LVI algorithm is described in Table 1. LVI
has two steps (Table 2).
The flowchart of the LVI algorithm is shown in
Figure 2 and its algorithmic steps are outlined in
Appendix A. The inputs to the LVI algorithm are the
measurement items and the outputs are the disjoint
item sets, each of which represents an LV. Each L1
contains a measurement item. L2 is generated using
only the necessary condition C-1. This is because
the axiom of conditional independence of L2 is not
directly testable (§3.2.2). Therefore, the LVI algorithm
works best for LVs that are measured with more than
two measurement items, as is strongly recommended
by SEM researchers (e.g., Kline 1998, Bollen 1989).
Then, LVI generates Lk+1from Lk by examining candidate
item sets based on C-1 and C-2 (see also Table 1).
Step 2 prunes all valid item sets by eliminating
all subsets and overlapping measurement items.
To ensure the smallest number of LVs, the LVI algorithm
begins from the largest item set (the one with
the most measurement items) among all Lk. It then
eliminates the overlapping items from the item set
that is affected the least, after removing any overlapping
items. Finally, the LVI algorithm outputs the disjoint
item sets, each of which represents an underlying
LV, and the value of each LV is computed according
to formulation (4).18
3.3. Stage 2. Constructing a Causal Bayesian
Network for Structural Models
After the LVs are identified and their values are computed,
the next step is to build a BN to test the causal
18 Additional information is contained in an online appendix to this
paper that is available on the Information Systems Research website
(http://isr.pubs.informs.org/ecompanion.html).
relationships among the LVs. This corresponds to the
structural model testing part of SEM. The common
approach to learning a BN from data is by specifying
a scoring function (typically variations of the
likelihood function) of each candidate network structure
and then selecting the BN with the highest score
(Friedman et al. 2000). Because examining the possible
network structure is NP-hard, the search algorithms
(for the optimal structure) in the BN literature are
almost exclusively variations of greedy algorithms. To
reduce the number of searches, Spirtes et al. (2002)
proposed the generic PC algorithm to generate an initial
starting point and then used a greedy search algorithm
based on the scoring function to reduce search
complexity. We follow this common practice and discover
the most likely BN in two steps: (1) generate an
initial class of equivalent BN using PC2 (our proposed
variation of the PC algorithm), and (2) select the
most likely causal BN using a new scoring function
designed specifically for ordinal and discrete (Likerttype)
data that are commonly found in IS research.
3.3.1. Generating Equivalent Classes of Bayesian
Networks from Data. Given a set of data, is it possible
to create a unique causal Bayesian network? The
consensus is that one cannot distinguish between
BN that specify the same conditional independence
from data alone. It is possible that two or more
BN structures represent the exact same constraints of
conditional independence (every joint probability distribution
generated by one BN structure can also be
generated by the other). In this case, the BN structures
are said to be likelihood equivalent.
When learning an equivalent class of structures
from data, we can conclude that the true BN is possibly
any one of the networks in this class (Friedman
et al. 2000). An equivalence class of network structures
can be uniquely represented by a partially
directed graph, where a directed edge X → Y suggests
that all members of the equivalence class contain
the arc X →Y . Otherwise, an undirected X–Y edge
denotes that some members of the class contain arc
X→Y while others contain arc Y →X. Learning the
causal relationships among LVs can be regarded as
the process of “directing” a graph.
The BN literature (e.g., Glymour et al. 1987,
Heckerman et al. 1995) has developed methods to
generate equivalent structures that have the same
Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables
Information Systems Research 21(2), pp. 365–391, ©2010 INFORMS 375
Table 2Detailed Steps of the LVI Algorithm
Step 1: Identify all sets of measurement items (item sets) that satisfy the axiom of conditional independence
LVI uses a maximum spanning approach. It starts with a randomly selected measurement item and it incrementally adds items to the item set. It stops when
no item can be added to the item set without violating the axiom. Denote Lk the item set with k measurement items that meet the conditional independence
axiom. The core step of the algorithm is to span from Lk to Lk+1 the item set containing k +1 measurement items that still meet the axiom. This is done by
adding an item not already in Lk into Lk , and then testing the axiom for the newitem set with these k +1 items using the method i