How to do it...

Initialize a new Python file by importing the following file:

import numpy as np 
from nltk.corpus import brown 
from chunking import splitter

Define the main function and read the input data from Brown corpus:

if __name__=='__main__': 
        content = ' '.join(brown.words()[:10000])

Split the text content into chunks:

    num_of_words = 2000 
    num_chunks = [] 
    count = 0 
    texts_chunk = splitter(content, num_of_words)

Build a vocabulary based on these text chunks:

    for text in texts_chunk: 
      num_chunk = {'index': count, 'text': text} 
      num_chunks.append(num_chunk) 
      count += 1

Extract a document word matrix, which effectively counts the amount of incidences of each word in the document:

  from sklearn.feature_extraction.text      
  import CountVectorizer

Extract the document term matrix:

from sklearn.feature_extraction.text import CountVectorizer 
vectorizer = CountVectorizer(min_df=5, max_df=.95) 
matrix = vectorizer.fit_transform([num_chunk['text'] for num_chunk in num_chunks])

Extract the vocabulary and print it:

vocabulary = np.array(vectorizer.get_feature_names()) 
print "nVocabulary:" 
print vocabulary

Print the document term matrix:

print "nDocument term matrix:" 
chunks_name = ['Chunk-0', 'Chunk-1', 'Chunk-2', 'Chunk-3', 'Chunk-4'] 
formatted_row = '{:>12}' * (len(chunks_name) + 1) 
print 'n', formatted_row.format('Word', *chunks_name), 'n'

Iterate throughout the words, and print the reappearance of every word in various chunks:

for word, item in zip(vocabulary, matrix.T): 
# 'item' is a 'csr_matrix' data structure 
 result = [str(x) for x in item.data] 
 print formatted_row.format(word, *result)

The result obtained after executing the bag-of-words model is shown as follows:

In order to understand how it works on a given sentence, refer to the following:

Introduction to Sentiment Analysis, explained here: https://blog.algorithmia.com/introduction-sentiment-analysis/