For our experiments, we manually selected 7 frequent causal conjunctions and 3 causal verbs listed in Table 2. These cues compose 16 pairs of markers in regular collocation, which are instantiated in extraction patterns. And we use 7 as the empirical value of the event window N.
Because the Google Search Engine returns at most 1000 items of users’ retrieval results, we employ a concrete verb in causal event expression to get more focused relations. That is, a wildcard between Cause_Marker and Effect_Marker is instantiated by a verb, and the patterns are used to identify its effect events. For this issue, we built a lexicon of over 10,000 common verbs with transitive labels, obtaining 8,387 transitive verbs and 4,732 intransitive. We send the pattern instances, as query items, to the Google Search Engine and download relevant texts returned from the Web.
As a preliminary filter, we use End_Marker to delete the useless suffix from these extracted causal expressions. And sentences in which the length of effect event is more than 7 are discarded. The final corpus consists of 1,960,000 sentences. We call it the Causal Corpus in the following experiments.