There are 402 files in the dataset.
Some of you may not have completed last week's lab. So, in addition to the text of each document, I have computed a vector space representation. That representation is available at /home/gtowell/Public/383/VS and in a tar file at /home/gtowell/Public/383/Dickens_VS.tar My vector space representation has two parts:
0,frowning 1,abrupt 2,salary 3,pretend 4,guards 5,coals 6,blasts 7,spreading 8,alphonse 9,compassionateThe location gives the position of that term in the vector space. More importantly, it is the term associated with a particular line in the files described next.