How to do it...

  1. Develop and import the following packages using Python:
import numpy as np 
from nltk.corpus import brown 
  1. Describe a function that divides text into chunks:
# Split a text into chunks 
def splitter(content, num_of_words): 
   words = content.split(' ') 
   result = [] 
  1. Initialize the following programming lines to get the assigned variables:
   current_count = 0 
   current_words = []
  1. Start the iteration using words:
   for word in words: 
     current_words.append(word) 
     current_count += 1 
  1. After getting the essential amount of words, reorganize the variables:
     if current_count == num_of_words: 
       result.append(' '.join(current_words)) 
       current_words = [] 
       current_count = 0 
  1. Attach the chunks to the output variable:
       result.append(' '.join(current_words)) 
       return result 
  1. Import the data of Brown corpus and consider the first 10000 words:
if __name__=='__main__': 
  # Read the data from the Brown corpus 
  content = ' '.join(brown.words()[:10000]) 
  1. Describe the word size in every chunk:
  # Number of words in each chunk 
  num_of_words = 1600 
  1. Initiate a pair of significant variables:
  chunks = [] 
  counter = 0 
  1. Print the result by calling the splitter function:
  num_text_chunks = splitter(content, num_of_words) 
  print "Number of text chunks =", len(num_text_chunks) 
  1. The result obtained after chunking is shown in the following screenshot: