- Python:Advanced Predictive Analytics
- Ashish Kumar Joseph Babcock
- 282字
- 2021-07-02 20:09:20
Reading the data – variations and examples
Before we delve deeper into the realm of data, let us familiarize ourselves with a few terms that will appear frequently from now on.
Data frames
A data frame is one of the most common data structures available in Python. Data frames are very similar to the tables in a spreadsheet or a SQL table. In Python vocabulary, it can also be thought of as a dictionary of series objects (in terms of structure). A data frame, like a spreadsheet, has index labels (analogous to rows) and column labels (analogous to columns). It is the most commonly used pandas object and is a 2D structure with columns of different or same types. Most of the standard operations, such as aggregation, filtering, pivoting, and so on which can be applied on a spreadsheet or the SQL table can be applied to data frames using methods in pandas
.
The following screenshot is an illustrative picture of a data frame. We will learn more about working with them as we progress in the chapter:
Fig. 2.1 A data frame
Delimiters
A delimiter is a special character that separates various columns of a dataset from one another. The most common (one can go to the extent of saying that it is a default delimiter) delimiter is a comma (,
). A .csv
file is called so because it has comma separated values. However, a dataset can have any special character as its delimiter and one needs to know how to juggle and manage them in order to do an exhaustive and exploratory analysis and build a robust predictive model. Later in this chapter, we will learn how to do that.