Data Handling Using Pandas - Dataframes #2 - vol. 2

Data Handling Using Pandas - Dataframes #2 - vol. 2

Let understand dataframes more deeply.

Hello reader, in my data handling using Pandas vol. 1 where we talked about data analysis, some basics of dataframes, series, etc. So, now let's understand dataframes more deeply.

Now, what comes in your mind when you think of a dataframe? So, basically DataFrame is a 2D labeled heterogeneous, data-mutable and size-mutable array which is widely used and is one of the most important data structures. The data in DataFrame is aligned in a tabular fashion in rows and columns therefore has both a row and column labels. Each column can have a different type of value such as numeric, string, boolean, etc

12-IP-Unit-1-Python-Pandas-I-Part-3-Dataframes-Notes-pdf.png

The above table describes data of doctors in the form of rows and columns. Here vertical subset are columns and horizontal subsets are rows. The column labels are Id, Name, Dept, Sex and Experience and row labels are 101, 104, 107, 109, 113 and 115.

So, let's take a look at the features/properties of dataframes :-

  1. Potentially columns are of different types
  2. Size – Mutable
  3. Labeled axes (rows and columns)
  4. Can Perform Arithmetic operations on rows and columns

A dataframe can be created using any of the following,

  • Series
  • Lists
  • Dictionary
  • A numpy 2D array

What syntax will now used by us to create a dataframe?

import pandas as pd 
pd.dataframe( data, index, column)

where data: takes various forms like series, list, constants/scalar values, dictionary, another dataframe.

index: specifies index/row labels to be used for resulting frame. They are unique and hashable with same length as data. Default is np.arrange(n) if no index is passed.

column: specifies column labels to be used for resulting frame. They are unique and hashable with same length as data. Default is np.arrange(n) if no index is passed.

An empty dataframe can be created as follows:

EXAMPLE 1:

Input,

import pandas as pd
dfempty = pd.DataFrame()
print (dfempty)

Output,

Empty DataFrame
Columns: []
Index: []

Now, let us create a dataframe from list of dictionaries

Input, 12 IP-Unit 1-Pyt.png

Output, 12 IP-Unit 1-Pyt (1).png

EXAMPLE 2:

Input,

import pandas as pd
l1=[{101:"Amit",102:"Binay",103:"Chahal"}, {102:"Arjun",103:"Fazal"}]
df=pd.DataFrame(l1)
print(df)

Output, 12 IP-Unit 1-Pyt (2).png

Explanation: Here, the dictionary keys are treated as column labels and row labels take default values starting from zero. The values corresponding to each key are treated as rows. The number of rows is equal to the number of dictionaries present in the list. There are two rows in the above dataframe as there are two dictionaries in the list. In the second row the value corresponding to key 101 is NaN because 101 key is missing in the second dictionary.

Note: If the value for a particular key is missing NaN (Not a Number) is inserted at that place.

Now, let us create a a dataframe from list of lists

EXAMPLE 3:

Input,

import pandas as pd
a=[[1,"Amit"],[2,"Chetan"],[3,"Rajat"],[4,"Vimal"]]
b=pd.DataFrame(a)
print(b)

Output, 12 IP-Unit 1-Pyt (3).png

Here, row and column labels take default values. Lists form the rows of the dataframe. Column labels and row index can also be changed in the following way:

b=pd.DataFrame(a,index=[’I’,’II’,’III’,’IV’],columns=[‘Rollno’,’Name’])

Output, 12 IP-Unit 1-Pyt (4).png

Now let's take a look on how can dataframe created from dictionary of lists, EXAMPLE 4:

import pandas as pd
d1={"Rollno":[1,2,3], "Total":[350.5,400,420],
"Percentage":[70,80,84]}
df1=pd.DataFrame(d1)
print(df1)

Output, 12 IP-Unit 1-Pyt (5).png

Explanation: Here, the dictionary keys are treated as column labels and row labels take default values starting from zero.

Now let's create a dataframe from dictionary of series

Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the series indexes passed.

EXAMPLE 5:

Input,

import pandas as pd
d2={"Rollno":pd.Series([1,2,3,4]), "Total":pd.Series([350.5,400,420]),
"Percentage":pd.Series([70,80,84,80])}
df2=pd.DataFrame(d2)
print(df2)

Output, 12 IP-Unit 1-Pyt (5).png

That's all for now reader. In next volume I will be sharing my insights over Customizing labels/ Changing index & column labels. Stay tuned for vol.3 🤞