Thanks for making the dataset available as well as some (extremely succinct) documentation. We now have normalized the data and have Terrier running nicely in our data warehouse alongside the other CAMEO-based automated datasets (GDELT2 - English and Translingual, ICEWS and Phoenix-RT). Could you please let us know, however, why you did not provide the URLs (preferably with the sentence numbers) in the dataset like the other datasets do? Needless to say, we always like to drill-down to the original articles for any of the visuals we produce from the data and then analyze. Or did we miss something?
The data for TERRIER comes entirely from LexisNexis, so we don’t actually have URLs for any of the stories. When we update the data, we’ll include either a LexisNexis ID number or the title of the story. One is a nice hard identifier, but requires LN access, while the other is a little more “open” but the original article may be not findable using only the title and source. Do you have thoughts on which you’d prefer?
Why either/or? Is it hard to include both?
Good point. We’ll see about both, but might opt for one or the other depending on which information is more easily connected across the databases.