When we have a lot of elements in our list, the thought of finding the highest or lowest element can come to our mind and Python has made it much easier for us.
In this article, we shall how we can use to find the second largest number in Python from a list.
Sorting the list and then print the second last number.
Removing the maximum element.
Finding the maximum element.
Traversing the list.
Let us have a look at the first approach-
Sorting the list and then print the second last number
The following program illustrates how we can do it in Python-
Example –
#program to find the second largest number of list
# declaring the list
list_val = [20, 30, 40, 25, 10]
# sorting the list
list_val.sort()
#displaying the second last element of the list
print("The second largest element of the list is:", list_val[-2])
Output:The second largest element of the list is: 30
It’s time to go for the explanation part-
We have declared the list from which we want to take out the second last element.
After this, we used the sort method so that all the elements of our list are arranged in ascending order.
Now we make use of negative indexing since the second-largest number will come at the second last position.
The second method is to obtain the second largest element of the list by removing the maximum element.
Let us see how we can do it.
Removing the maximum element
Example –
#program to find the second largest number of list
# declaring the list
list_val = [20, 30, 40, 25, 10]
# new_list is a set of list1
res_list = set(list_val)
#removing the maximum element
res_list.remove(max(res_list))
#printing the second largest element
print(max(res_list))
Output:30
Explanation –
Let us understand what we have done in the above program-
We have declared the list from which we want to take out the second last element.
After this, we used the set method to take all the unique elements of the list.
Now we make use of max() to get the maximum value from the list and then remove it.
After this, we print the maximum of the resultant list which will give us the second-largest number.
In the third method, we will use for loop and find the second largest number from the list.
Example –
# declaring empty list
list_val = []
# user provides the number of elements to be added in the list
num_list = int(input("Enter number of elements in list: "))
for i in range(1, num_list + 1):
element = int(input("Enter the elements: "))
list_val.append(element)
# sort the list
list_val.sort()
# print second largest element
print("Second largest element is:", list_val[-2])
Output:Enter number of elements in list: 5 Enter the elements: 10 Enter the elements: 20 Enter the elements: 30 Enter the elements: 40 Enter the elements: 50 The second largest element is: 40
Explanation –
Let us have a glance at what we have done here-
We have declared an empty list in which we will insert the elements.
After this, we ask the user to provide us the number of elements we would like to add to our list.
After this, we use the sort method so that all the elements of our list are arranged in ascending order.
Now we make use of negative indexing since the second-largest number will come at the second last position.
Traversing the list
In the last program, we will traverse the list to find out the largest number and then make use of conditional statements to find the second largest number from the list.
The following program illustrates the same-
Example –
def calc_largest(arr):
second_largest = arr[0]
largest_val = arr[0]
for i in range(len(arr)):
if arr[i] > largest_val:
largest_val = arr[i]
for i in range(len(arr)):
if arr[i] > second_largest and arr[i] != largest_val:
second_largest = arr[i]
return second_largest
print(calc_largest([20, 30, 40, 25, 10]))
Output:
Explanation –
Let us understand what we have done in the above program-
The first step is to create a function that checks the largest number from the list by traversing it.
In the next for loop, we traverse the list again for finding the highest number but this time excludes the previous one since here our objective is to find the second largest function.
Finally, we pass our list in the function.
So, in this article, we got the chance to think out of the box and discover some new ways to develop the logic for finding the second largest number in Python.
In this tutorial, we are going to learn about the SimpleImputer module of the Sklearn library, and it was previously known as impute module but updated in the latest versions of the Sklearn library. We will discuss the SimpleImputer class and how we can use it to handle missing data in a dataset and replace the missing values inside the dataset using a Python program.
SimpleImputer class
A scikit-learn class that we can use to handle the missing values in the data from the dataset of a predictive model is called SimpleImputer class. With the help of this class, we can replace NaN (missing values) values in the dataset with a specified placeholder. We can implement and use this module class by using the SimpleImputer() method in the program.
Syntax for SimpleImputer() method:
To implement the SimpleImputer() class method into a Python program, we have to use the following syntax:
SimpleImputer(missingValues, strategy)
Parameters: Following are the parameters which has to be defined while using the SimpleImputer() method:
missingValues: It is the missing values placeholder in the SimpleImputer() method which has to be imputed during the execution, and by default, the value for missing values placeholder is NaN.
strategy: It is the data that is going to replace the missing values (NaN values) from the dataset, and by default, the value method for this parameter is ‘Mean’. The strategy parameter of the SimpleImputer() method can take ‘Mean’, ‘Mode’, Median’ (Central tendency measuring methods) and ‘Constant’ value input in it.
fillValue: This parameter is used only in the strategy parameter if we give ‘Constant’ as replacing value method. We have to define the constant value for the strategy parameter, which is going to replace the NaN values from the dataset.
SimpleImputer class is the module class of Sklearn library, and to use this class, first we have to install the Sklearn library in our system if it is not present already.
Installation of Sklearn library:
We can install the Sklearn by using the following command inside the command terminal prompt of our system:
pip install sklearn
After pressing the enter key, the sklearn module will start installing in our device, as we can see below:
Now, the Sklearn module is installed in our system, and we can move ahead with the SimpleImputer class function.
Handling NaN values in the dataset with SimpleImputer class
Now, we will use the SimpleImputer class in a Python program to handle the missing values present in the dataset (that we will use in the program). We will define a dataset in the example program while giving some missing values in it, and then we use the SimpleImputer class method to handle those values from the dataset by defining its parameters. Let’s understand the implementation of this through an example Python program.
Example 1: Look at the following Python program with a dataset having NaN values defined in it:
# Import numpy module as nmp
import numpy as nmp
# Importing SimpleImputer class from sklearn impute module
from sklearn.impute import SimpleImputer
# Setting up imputer function variable
imputerFunc = SimpleImputer(missing_values = nmp.nan, strategy ='mean')
# Defining a dataset
dataSet = [[32, nmp.nan, 34, 47], [17, nmp.nan, 71, 53], [19, 29, nmp.nan, 79], [nmp.nan, 31, 23, 37], [19, nmp.nan, 79, 53]]
# Print original dataset
print("The Original Dataset we defined in the program: \n", dataSet)
# Imputing dataset by replacing missing values
imputerFunc = imputerFunc.fit(dataSet)
dataSet2 = imputerFunc.transform(dataSet)
# Printing imputed dataset
print("The imputed dataset after replacing missing values from it: \n", dataSet2)
Output:The Original Dataset we defined in the program: [[32, nan, 34, 47], [17, nan, 71, 53], [19, 29, nan, 79], [nan, 31, 23, 37], [19, nan, 79, 53]] The imputed dataset after replacing missing values from it: [[32. 30. 34. 47. ] [17. 30. 71. 53. ] [19. 29. 51.75 79. ] [21.75 31. 23. 37. ] [19. 30. 79. 53. ]]
Explanation:
We have firstly imported the numpy module (to define a dataset) and sklearn module (to use the SimpleImputer class method) into the program. Then, we defined the imputer to handle the missing values using the SimpleImputer class method, and we used the ‘mean’ strategy to replace the missing values from the dataset. After that, we have defined a dataset in the program using the numpy module function and gave some missing values (NaN values) in the dataset. Then, we printed the original dataset in the output. After that, we have imputed and replaced the missing values from the dataset with the imputer that we have defined earlier in the program with SimpleImputer class. After imputing the dataset and replacing the missing values from it, we have printed the new dataset as a result.
As we can see in the output, the imputed value dataset having mean values in the place of missing values, and that’s how we can use the SimpleImputer module class to handle NaN values from a dataset.
Conclusion
We have read about the SimpleImputer class method in this method, and we learned how we could use it to handle the NaN values present in a dataset. We learned about the strategy value parameter, which we use to define the method for replacing the NaN values of the dataset. We have also learned about the installation of the Sklearn library, and then last, we used the SimpleImputer class method in an example to impute the dataset.
OpenCV is the huge and open-source library for image processing, machine learning and computer vision. It is also playing an important role in real-time operation. With the help of the OpenCV library, we can easily process the images as well as videos to identify the objects, faces or even handwriting of a human present in the file. We will only focus to object detection from images using OpenCV in this tutorial. We will learn about how we can use OpenCV to do object detection from a given image using a Python program.
Object Detection
Basically, object detection is a modern computer technology that is related to image processing, deep learning and computer vision to detect the objects present in an image file. All the technologies used in the Object detection technique (as we mentioned earlier) deals with detecting instances of the object in the image or video.
Object Detection using OpenCV
We have learned about object detection in the previous section, and in this section, we will learn that how we can do object detection in an image or video using the OpenCV library. We will first import the OpenCV library in the Python program, and then we will use functions to perform object detection on an image file given to us. But, before using and importing the library functions, let’s first install the requirements for using the Object detection technique.
In this tutorial, we will use the Haar cascade technique to do object detection. Let’s learn in brief about the Haar cascade technique first.
Haar cascade:
Basically, the Haar cascade technique is an approach based on machine learning where we use a lot of positive and negative images to train the classifier to classify between the images. Haar cascade classifiers are considered as the effective way to do object detection with the OpenCV library. Now, let’s understand the concept of positive and negative images that we have discussed earlier:
Positive images: These are the images that contain the objects which we want to be identified from the classifier.
Negative Images: These are the images that do not contain any object that we want to be detected by the classifier, and these can be images of everything else.
Requirements for object detection with Python OpenCV:
We have to install first some important libraries in our system as it is an important requirement for doing object detection tasks. We have to install the following libraries into our system as the requirement for performing object detection:
1. Installation of OpenCV library:
First and foremost, the requirement to perform object detection using the OpenCV library is that the OpenCV library should be present in our device so that we can import it into a Python program and use its object detection functions. If this library is not present in our system, we can use the following command in our command prompt terminal to install it:
pip install opencv-python
When we press the enter key after writing this command in the terminal, the pip installer in the command prompt will start installing the OpenCV library into our system.
As we can see that, the OpenCV library is successfully installed in our system, and now we can import it into a Python program to use its functions.
2. Installation of matplotlib library:
Matplotlib is very helpful in the opening, closing, reading etc., images in a Python program, and that’s why the installation of this library for object detection becomes an important requirement. If the matplotlib library is not present in our system, we have to use the following command in our command prompt terminal to install it:
pip install matplotlib
When we press the enter key after writing this command in the terminal, the pip installer in the command prompt will start installing it into our system.
As we can see that, the matplotlib library is successfully installed in our system, and now we can import it into a Python program to use its functions for opening, reading etc., images.
We have installed all the required libraries for performing object detection, and now we can move ahead with the implementation part of this task.
Implementation of Object detection in Python:
In this part, we will write the Python programs to do the object detection and understand the implementation of it. We will use the following image in our Python program to perform the object detection on it:
Opening the Image
We will first open the image given above and create the environment of the picture to show it in the output. Let’s first look at an example program to understand the implementation, and then we will look at the explanation part.
Example 1: Opening the image using OpenCV and matplotlib library in a Python program:
# Import OpenCV module
import cv2
# Import pyplot from matplotlib as pltd
from matplotlib import pyplot as pltd
# Opening the image from files
imaging = cv2.imread("opencv-od.png")
# Altering properties of image with cv2
img_gray = cv2.cvtColor(imaging, cv2.COLOR_BGR2GRAY)
imaging_rgb = cv2.cvtColor(imaging, cv2.COLOR_BGR2RGB)
# Plotting image with subplot() from plt
pltd.subplot(1, 1, 1)
# Displaying image in the output
pltd.imshow(imaging_rgb)
pltd.show()
Output:
Explanation:
First, we have imported the OpenCV (as cv2) and matplotlib (as plt) libraries into the program to use their functions in the code. After that, we have opened the image file using the imread() function of cv2.
Then, we have defined the properties for the image we opened in the program using the cv2 functions. Then, we subplot the image using the subplot() function of plt and giving parameters in it. In last, we have used the imshow() and show() function of the plt module to show the image in the output.
As we can see in the output, the image is displayed as a result of the program, and its borders have been sub-plotted.
Recognition or object detection in the image
Now, we will use the detectMultiScale() in the program to detect the object present in the image. Following is the syntax for using detectMultiScale() function in the code:
found = xml_data.detectMultiScale(img_gray,
minSize = (30, 30))
We will use a condition statement with this function in the program to check if any object from the image is detected or not and highlight the detected part. Let’s understand the implementation of object detection in the image through an example program.
Example 2: Object detection in the image using the detectMultiScale() in the following Python program:
# Import OpenCV module
import cv2
# Import pyplot from matplotlib as plt
from matplotlib import pyplot as pltd
# Opening the image from files
imaging = cv2.imread("opencv-od.png")
# Altering properties of image with cv2
imaging_gray = cv2.cvtColor(imaging, cv2.COLOR_BGR2GRAY)
imaging_rgb = cv2.cvtColor(imaging, cv2.COLOR_BGR2RGB)
# Importing Haar cascade classifier xml data
xml_data = cv2.CascadeClassifier('XML-data.xml')
# Detecting object in the image with Haar cascade classifier
detecting = xml_data.detectMultiScale(imaging_gray,
minSize = (30, 30))
# Amount of object detected
amountDetecting = len(detecting)
# Using if condition to highlight the object detected
if amountDetecting != 0:
for (a, b, width, height) in detecting:
cv2.rectangle(imaging_rgb, (a, b), # Highlighting detected object with rectangle
(a + height, b + width),
(0, 275, 0), 9)
# Plotting image with subplot() from plt
pltd.subplot(1, 1, 1)
# Displaying image in the output
pltd.imshow(imaging_rgb)
pltd.show()
Output:
Explanation:
After opening the image in the program, we have imported the cascade classifier XML file into the program. Then, we used the detectMultiScale() function with the imported cascade file to detect the object present in the image or not.
We used if condition in the program to check that object is detected or not, and if the object is detected, we have highlighted the detected object part using for loop with cv2 functions. After highlighting the detected object part in the image, we have displayed the processed image using the plt show() and imshow() function.
As we can see in the output, the image with the object detected part as highlighted is shown to us when we run the program.
In the following tutorial, we will discuss the nsetools library in the Python programming language. We will understand its features and work with some examples.
So, let’s get started.
Understanding the nsetools library
NSE or National Stock Exchange of India Limited is the leading stock exchange of India, situated in Mumbai, Maharashtra. NSE was established in the year 1992 as the first dematerialized electronic exchange in the country.
Python offers a library that allows the programmers to collect real-time data from National Stock Exchange (India). This library is known as nsetools. We can use this library in different projects, which requires fetching live quotes for a provided index or stock or creating large sets of data for further data analytics. We can also create Command-Line Interface (CLI) Applications that may deliver us the details of the live market at a blazing fast speed, pretty faster than any web browser. The data accuracy is only as correct as provided on the official website of the National Stock Exchange of India Limited. (http://www.nseindia.com)
Main features of the Python nsetools library
Some of the key features of the Python nsetools library are stated as follows:
The nsetools library works out of the box, without any setup requirement.
This library helps programmers to fetch livestock code and index codes at blazing fast speed.
It also offers a set of all stocks and indices traded on the National Stock Exchange.
Moreover, it also provides a set of:
Top losers
Top gainers
Most active
It also delivers several helpful Application Programming Interfaces (APIs) in order to validate a stock code and index code.
The library optionally returns data in JSON format.
It has a hundred per cent Unit test coverage.
How to install the Python nsetools library?
The installation part of the nsetools library is quite easy, and it has no external dependencies. All the dependencies of the library are part of standard distribution packages of Python. We can install the nsetools library using the pip installer as shown in the following syntax:
Syntax:
$ pip install nsetools
Updating the library
If some of us already have installed the nsetools library in their systems, then the following command will allow them to update the library.
Syntax:
$ pip install nsetools -upgrade
Python 3 support
Python 3 support for the library has been included from version 1.0.0 and so on. Now, this library is able to work for both Python 2 as well as Python 3.
Creating an NSE object
We can create an NSE object using the Nse() function offered by the nsetools library. The same can be seen in the following example:
Example:
# importing the Nse() function from the nsetools library
from nsetools import Nse
# creating an NSE object
nse_obj = Nse()
# printing the value of the object
print("NSE Object:", nse_obj)
Output:NSE Object: Driver Class for National Stock Exchange (NSE)
Explanation:
In the above snippet of code, we have imported the required function from the library. We have then defined a variable that uses the Nse() function to create an NSE object. We have then printed the value of the variable for the users.
Getting Information using the nsetools library
Let us consider an example demonstrating the use of nsetools for gathering Information.
Example:
# importing the Nse() function from the nsetools library
In the above snippet of code, we have imported the required module and created an NSE object using the Nse() function. We have then defined another variable that uses the get_quote() function on the NSE object to get the quotation of the specified company. We have then printed the required details for the users.
Multiprocessing is the ability of the system to run one or more processes in parallel. In simple words, multiprocessing uses the two or more CPU within the single computer system. This method is also capable to allocate the tasks between more than one process.
Processing units share the main memory and peripherals to process programs simultaneously. Multiprocessing Application breaks into smaller parts and runs independently. Each process is allocated to the processor by the operating system.
Python provides the built-in package called multiprocessing which supports swapping processes. Before working with the multiprocessing, we must aware with the process object.
Why Multiprocessing?
Multiprocessing is essential to perform the multiple tasks within the Computer system. Suppose a computer without multiprocessing or single processor. We assign various processes to that system at the same time.
It will then have to interrupt the previous task and move to another to keep all processes going. It is as simple as a chef is working alone in the kitchen. He has to do several tasks to cook food such as cutting, cleaning, cooking, kneading dough, baking, etc.
Therefore, multiprocessing is essential to perform several task at the same time without interruption. It also makes easy to track all the tasks. That is why the concept of multiprocessing is to arise.
Multiprocessing can be represented as a computer with more than one central processor.
A Multi-core processor refers to single computing component with two or more independent units.
In the multiprocessing, the CPU can assign multiple tasks at one each task has its own processor.
Multiprocessing In Python
Python provides the multiprocessing module to perform multiple tasks within the single system. It offers a user-friendly and intuitive API to work with the multiprocessing.
Let’s understand the simple example of multiple processing.
Example –
from multiprocessing import Process
def disp():
print ('Hello !! Welcome to Python Tutorial')
if __name__ == '__main__':
p = Process(target=disp)
p.start()
p.join()
Output:‘Hello !! Welcome to Python Tutorial’
Explanation:
In the above code, we have imported the Process class then create the Process object within the disp() function. Then we started the process using the start() method and completed the process with the join() method. We can also pass the arguments in the declared function using the args keywords.
Let’s understand the following example of the multiprocessing with arguments.
Example – 2
# Python multiprocessing example
# importing the multiprocessing module
import multiprocessing
def cube(n):
# This function will print the cube of the given number
print("The Cube is: {}".format(n * n * n))
def square(n):
# This function will print the square of the given number
print("The Square is: {}".format(n * n))
if __name__ == "__main__":
# creating two processes
process1 = multiprocessing.Process(target= square, args=(5, ))
process2 = multiprocessing.Process(target= cube, args=(5, ))
# Here we start the process 1
process1.start()
# Here we start process 2
process2.start()
# The join() method is used to wait for process 1 to complete
process1.join()
# It is used to wait for process 1 to complete
process2.join()
# Print if both processes are completed
print("Both processes are finished")
Output:
Explanation –
In the above example, We created the two functions – the cube() function calculates the given number’s cube, and the square() function calculates the square of the given number.
Next, we defined the process object of the Process class that has two arguments. The first argument is a target that represents the function to be executed, and the second argument is args that represents the argument to be passed within the function.
We have used the start() method to start the process.
process1.start()
process2.start()
As we can see in the output, it waits to completion of process one and then process 2. The last statement is executed after both processes are finished.
Python Multiprocessing Classes
Python multiprocessing module provides many classes which are commonly used for building parallel program. We will discuss its main classes – Process, Queue and Lock. We have already discussed the Process class in the previous example. Now we will discuss the Queue and Lock classes.
Let’s see the simple example of a get number of CPUs currently in the system.
Example –
import multiprocessing
print("The number of CPU currently working in system : ", multiprocessing.cpu_count())
Output:(‘The number of CPU currently woking in system : ‘, 32)
The above number of CPUs can vary for your pc. For us, the number of cores is 32.
Python Multiprocessing Using Queue Class
We know that Queue is important part of the data structure. Python multiprocessing is precisely the same as the data structure queue, which based on the “First-In-First-Out” concept. Queue generally stores the Python object and plays an essential role in sharing data between processes.
Queues are passed as a parameter in the Process’ target function to allow the process to consume data. The Queue provides the put() function to insert the data and get() function to get data from the queues. Let’s understand the following example.
Example –
# Importing Queue Class
from multiprocessing import Queue
fruits = ['Apple', 'Orange', 'Guava', 'Papaya', 'Banana']
count = 1
# creating a queue object
queue = Queue()
print('pushing items to the queue:')
for fr in fruits:
print('item no: ', count, ' ', fr)
queue.put(fr)
count += 1
print('\npopping items from the queue:')
count = 0
while not queue.empty():
print('item no: ', count, ' ', queue.get())
count += 1
In the above code, we have imported the Queue class and initialized the list named fruits. Next, we assigned a count to 1. The count variable will count the total number of elements. Then, we created the queue object by calling the Queue() method. This object will used to perform operations in the Queue. In for loop, we inserted the elements one by one in the queue using the put() function and increased the count by 1 with each iteration of loop.
Python Multiprocessing Lock Class
The multiprocessing Lock class is used to acquire a lock on the process so that we can hold the other process to execute a similar code until the lock has been released. The Lock class performs mainly two tasks. The first is to acquire a lock using the acquire() function and the second is to release the lock using the release() function.
Python Multiprocessing Example
Suppose we have multiple tasks. So, we create two queues: the first queue will maintain the tasks, and the other will store the complete task log. The next step is to instantiate the processes to complete the task. As discussed previously, the Queue class is already synchronized, so we don’t need to acquire a lock using the Lock class.
In the following example, we will merge all the multiprocessing classes together. Let’s see the below example.
Example –
from multiprocessing import Lock, Process, Queue, current_process
import time
import queue
def jobTodo(tasks_to_perform, complete_tasks):
while True:
try:
# The try block to catch task from the queue.
# The get_nowait() function is used to
# raise queue.Empty exception if the queue is empty.
task = tasks_to_perform.get_nowait()
except queue.Empty:
break
else:
# if no exception has been raised, the else block will execute
# add the task completion
print(task)
complete_tasks.put(task + ' is done by ' + current_process().name)
time.sleep(.5)
return True
def main():
total_task = 8
total_number_of_processes = 3
tasks_to_perform = Queue()
complete_tasks = Queue()
number_of_processes = []
for i in range(total_task):
tasks_to_perform.put("Task no " + str(i))
# defining number of processes
for w in range(total_number_of_processes):
p = Process(target=jobTodo, args=(tasks_to_perform, complete_tasks))
number_of_processes.append(p)
p.start()
# completing process
for p in number_of_processes:
p.join()
# print the output
while not complete_tasks.empty():
print(complete_tasks.get())
return True
if __name__ == '__main__':
main()
Output:Task no 2 Task no 5 Task no 0 Task no 3 Task no 6 Task no 1 Task no 4 Task no 7 Task no 0 is done by Process-1 Task no 1 is done by Process-3 Task no 2 is done by Process-2 Task no 3 is done by Process-1 Task no 4 is done by Process-3 Task no 5 is done by Process-2 Task no 6 is done by Process-1 Task no 7 is done by Process-3
Python Multiprocessing Pool
Python multiprocessing pool is essential for parallel execution of a function across multiple input values. It is also used to distribute the input data across processes (data parallelism). Consider the following example of a multiprocessing Pool.
Example –
from multiprocessing import Pool
import time
w = (["V", 5], ["X", 2], ["Y", 1], ["Z", 3])
def work_log(data_for_work):
print(" Process name is %s waiting time is %s seconds" % (data_for_work[0], data_for_work[1]))
time.sleep(int(data_for_work[1]))
print(" Process %s Executed." % data_for_work[0])
def handler():
p = Pool(2)
p.map(work_log, w)
if __name__ == '__main__':
handler()
Output:Process name is V waiting time is 5 seconds Process V Executed. Process name is X waiting time is 2 seconds Process X Executed. Process name is Y waiting time is 1 seconds Process Y Executed. Process name is Z waiting time is 3 seconds Process Z Executed.
Let’s understand another example of the multiprocessing Pool.
Example – 2
from multiprocessing import Pool
def fun(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(fun, [1, 2, 3]))
Output:[1, 8, 27]
Proxy Objects
The proxy objects are referred to as shared objects which reside in a different process. This object is also called as a proxy. Multiple proxy objects might have a similar referent. A proxy object consists of various methods which are used to invoked corresponding methods of its referent. Below is the example of proxy objects.
The proxy objects are picklable so we can pass them between processes. These objects are also used for level of control over the synchronization.
Commonly Used Functions of Multiprocessing
So far, we have discussed the basic concepts of multiprocessing using Python. Multiprocessing is a broad topic itself and essential for performing various tasks within a single system. We are defining a few essential functions that are commonly used to achieve multiprocessing.
Method
Description
pipe()
The pipe() function returns a pair of connection objects.
run()
The run() method is used to represent the process activities.
start()
The start()method is used to start the process.
join([timeout])
The join() method is used to block the process until the process whose join() method is called terminates. The timeout is optional argument.
is_alive()
It returns if process is alive.
terminate()
As the name suggests, it is used to terminate the process. Always remember – the terminate() method is used in Linux, for Windows, we use TerminateProcess() method.
kill()
This method is similar to the terminate() but using the SIGKILL signal on Unix.
close()
This method is used to close the Process object and releases all resources associated with it.
qsize()
It returns the approximate size of the queue.
empty()
If queue is empty, it returns True.
full()
It returns True, if queue is full.
get_await()
This method is equivalent get(False).
get()
This method is used to get elements from the queue. It removes and returns an element from queue.
put()
This method is used to insert an element into the queue.
cpu_count()
It returns the number of working CPU within the system.
current_process()
It returns the Process object corresponding to the current process.
parent_process()
It returns the parent Process object corresponding to the current process.
task_done()
This function is used indicate that an enqueued task is completed.
Itertool is one of the most amazing Python 3 standard libraries. This library has pretty much coolest functions and nothing wrong to say that it is the gem of the Python programing language. Python provides excellent documentation of the itertools but in this tutorial, we will discuss few important and useful functions or iterators of itertools.
The key thing about itertools is that the functions of this library are used to make memory-efficient and precise code.
Before learning the Python itertools, you should have knowledge of the Python iterator and generators. In this article, we will describe itertools for beginners are well as for professionals.
Introduction
According to the official definition of itertools, “this module implements a number of iterator building blocks inspired by constructs from APL, Haskell, and SML.” In simple words, the number of iterators can together create ‘iterator algebra’ which makes it possible to complete the complex task. The functions in itertools are used to produce more complex iterators. Let’s take an example: Python built-in zip() function accepts any number of arguments as iterable. It iterates over tuples and return their corresponding elements.
a = [1,2,3]
b= ['a', 'b', 'c']
c = zip(a,b)
print(c)
Output:[(1, ‘a’), (2, ‘b’), (3, ‘c’)]
In the above code, we have passed two lists [1,2,3] and [‘a’, ‘b’, ‘c’] as iterable in zip() function. These lists return one element at a time. In Python, an element that implement .__iter__() or .__getitem__() method called iterable.
The Python iter() function is used to call on the iterable and return iterator object of the iterable.
a = iter('Hello')
print(a)
Output:<str_iterator object at 0x01505FA0>
The Python zip() function calls iter() on each of its argument and then calls next() by combining the result into tuple.
Note: If you are using the zip() function and map() function that means you are already using itertools. You don’t need to import it distinctly.
Types of Iterator
There are various types of iterator in itertools module. The list is given below:
Infinite iterators
Combinatoric iterators
Terminating iterators
Infinite Iterators
In Python, any object that can implement for loop is called iterators. Lists, tuples, set, dictionaries, strings are the example of iterators but iterator can also be infinite and this type of iterator is called infinite iterator.
Iterator
Argument
Results
count(start,step)
start, [step]
start, start+step, step+2*step
cycle()
P
p0,p1,….plast
repeat()
elem [,n]
elem, elem, elem,….endlessly or upto n times
count(start, stop): It prints from the start value to infinite. The step argument is optional, if the value is provided to the step then the number of steps will be skipped. Consider the following example:
import itertools
for i in itertools.count(10,5):
if i == 50:
break
else:
print(i,end=" ")
Output:10 15 20 25 30 35 40 45
cycle(iterable): This iterator prints all value in sequence from the passed argument. It prints the values in a cyclic manner. Consider the following example:
import itertools
temp = 0
for i in itertools.cycle("123"):
if temp > 7:
break
else:
print(i,end=' ')
temp = temp+1
Output: 1 2 3 1 2 3 1 2 3 1 2
Example – 2: Using next() function
import itertools
val = ['Java', 'T', 'Point']
iter = itertools.cycle(val)
for i in range(6):
# Using next function
print(next(iter), end = " ")
Output:Java T Point Java T Point
repeat(val,num): As the name suggests, it repeatedly prints the passed value for infinite time. The num argument is optional. Consider the following example:
Combinatoric iterators: The complex combinatorial constructs are simplified by the recursive generators. The permutations, combinations, and Cartesian products are the example of the combinatoric construct.
In Python, there are four types of combinatoric iterators:
Product() – It is used to calculate the cartesian product of input iterable. In this function, we use the optional repeat keyword argument for computation of the product of an iterable with itself. The repeat keyword represents the number of repetitions. It returns output in the form of sorted tuples. Consider the following example:
from itertools import product
print("We are computing cartesian product using repeat Keyword Argument:")
print(list(product([1, 2], repeat=2)))
print()
print("We are computing cartesian product of the containers:")
print(list(product(['Java', 'T', 'point'], '5')))
print()
print("We are computing product of the containers:")
print(list(product('CD', [4, 5])))
Output:Computing cartesian product using repeat Keyword Argument: [(1, 1), (1, 2), (2, 1), (2, 2)] Computing cartesian product of the containers: [(‘Java’, ‘5’), (‘T’, ‘5’), (‘point’, ‘5’)] Computing product of the containers: [(‘C’, 4), (‘C’, 5), (‘D’, 4), (‘D’, 5)]
Permutations(): It is used to generate all possible permutation of an iterable. The uniqueness of each element depends upon their position instead of values. It accepts two argument iterable and group_size. If the value of group_size is none or not specified then group_size turns into length of the iterable.
from itertools import permutations
print("Computing all permutation of the following list")
print(list(permutations([3,"Python"],2)))
print()
print("Permutations of following string")
print(list(permutations('AB')))
print()
print("Permutation of the given container is:")
print(list(permutations(range(4),2)))
Output:Computing all permutation of the following list [(3, ‘Python’), (‘Python’, 3)] Permutations of following string [(‘A’, ‘B’), (‘B’, ‘A’)] Permutation of the given container is: [(0, 1), (0, 2), (0, 3), (1, 0), (1, 2), (1, 3), (2, 0), (2, 1), (2, 3), (3, 0), (3, 1), (3, 2)]
Combinations(): It is used to print all the possible combinations (without replacement) of the container which is passed as argument in the specified group size in sorted order.
from itertools import combinations
print(“Combination of list in sorted order(without replacement)”,list(combinations([‘B’,3],2)))
print()
print(“Combination of string in sorted order”,list(combinations(“ZX”,2)))
print()
print(“Combination of list in sorted order”,list(combinations(range(20),1)))
Output:Combination of list in sorted order(without replacement) [(‘B’, 3)] Combination of string in sorted order [(‘Z’, ‘X’)] Combination of list in sorted order [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]
Combination_with_replacement(): It accepts two arguments, first argument is a r-length tuple and the second argument is repetition. It returns a subsequence of length n from the elements of the iterable and repeat the same process. Separate elements may repeat itself in combination_with_replacement()
from itertools import combinations_with_replacement
print("Combination of string in sorted order(with replacement) is:")
Terminating iterators are generally used to work on the small input sequence and generate the output based on the functionality of the method used in iterator.
There are different types of terminating iterator:
accumulate(iter, func): It takes two arguments, the first argument is iterable and the second is a function which would be followed at each iteration of value in iterable. If the function is not defined in accumulate() iterator, addition takes place by default. The output iterable depends on the input iterable; if input iterable contains no value then the output iterable will also be empty.
import itertools
import operator
# initializing list 1
list1 = [1, 4, 5, 7, 9, 11]
# using accumulate() that will prints the successive summation of elements
print("The sum is : ", end="")
print(list(itertools.accumulate(list1)))
# using accumulate() that will prints the successive multiplication of elements
print("The product is : ", end="")
print(list(itertools.accumulate(list1, operator.mul)))
# using accumulate() that will prints the successive summation of elements
print("The sum is : ", end="")
print(list(itertools.accumulate(list1)))
# using accumulate() that will prints the successive multiplication of elements
print("The product is : ", end="")
print(list(itertools.accumulate(list1, operator.mul)))
Output:The sum is : [1, 5, 10, 17, 26, 37] The product is : [1, 4, 20, 140, 1260, 13860] The sum is : [1, 5, 10, 17, 26, 37] The product is : [1, 4, 20, 140, 1260, 13860]
chain(iter1, iter2) – It is used to print all the values in iterable passed in the form of chain and declared in arguments. Consider the following example:
import itertools
# declaring list 1
list1 = [1, 2, 3, 4]
# declaring list 2
list2 = [1, 5, 6, 8]
# declaring list 3
list3 = [9, 10, 11, 12]
# using chain() function that will to print all elements of lists
print("The output is : ", end="")
print(list(itertools.chain(list1, list2, list3)))
dropwhile(func, seq) – It starts printing the character only after the func. Consider the following argument:
import itertools
# initializing list
list1 = [2, 4, 5, 7, 8]
# using dropwhile() iterator that will print start displaying after condition is false
print("The output is : ", end="")
print(list(itertools.dropwhile(lambda x: x % 2 == 0, list1)))
Output:The output is : [5, 7, 8]
filterfalse(func,seq) – We can assume it by its name, as this iterator prints only those values that return false for the passed function. Consider the following example:
import itertools
# declaring list
list1 = [12, 14, 15, 27, 28]
# using filterfalse() iterator that will print false values
print("The Output is: ", end="")
print(list(itertools.filterfalse(lambda x: x % 2 == 0, list1)))
Output:The Output is : [15, 27]
islice(iterable,start,stop,step) – It slices the given iterable according to given position. It accepts four arguments respectively and these are iterable, container, starting pos., ending position and step(optional).
import itertools
# Declaring list
list1 = [12, 34, 65, 73, 80, 19, 20]
# using islice() iterator that will slice the list acc. to given argument
# starts printing from 3nd index till 8th skipping 2
print("The sliced list values are : ", end="")
print(list(itertools.islice(list1, 2, 8, 2)))
Output:The sliced list values are : [34, 73, 19]
starmap(func, tuple list) – It takes two arguments; first argument is function and second argument is list which consists element in the form of tuple. Consider the following example.
import itertools
# Declaring list that contain tuple as element
list1 = [(10, 20, 15), (18, 40, 19), (53, 42, 90), (16, 12, 27)]
# using starmap() iterator for selection value acc. to function
# selects max of all tuple values
print("The values acc. to function are : ", end="")
print(list(itertools.starmap(max, list1)))
Output:The values acc. to function are : [20, 40, 90, 27]
takewhile(func, iterable) – It is visa-versa of dropwhile(). It will print values until it returns false condition. Consider the following example:
import itertools
# Defining a list
list1 = [20, 42, 64, 77, 8, 10, 20]
# takewhile() iterator is used to print values till condition returnfalse.
print("Print until 1st false value returned : ", end="")
print(list(itertools.takewhile(lambda x: x % 2 == 0, list1)))
Output:The list values until false value return : [20, 42, 64]
tee(iterator, count) – It divides the container into a number of iterators which is defined in the argument. Consider the following example:
import itertools
# Declaring list
li = [1, 2, 3, 4, 5, 6, 7]
# storing list in iterator
iti = iter(li)
# using tee() iterator to create a list of iterators
# Creating list of 3 iterators having similar values.
it = itertools.tee(iti, 3)
# It will print object of iterator
print(it)
print("The iterators are : ")
for i in range(0, 2):
print(list(it[i]))
Output:(<itertools._tee object at 0x01B88D88>, <itertools._tee object at 0x01B88DA8>, <itertools._tee object at 0x01B88BA8>) The iterators are : [1, 2, 3, 4, 5, 6, 7] [1, 2, 3, 4, 5, 6, 7]
zip_longest(iterable1, iterable2, fillval) – It prints the values of iterable alternatively in sequence. If one of the iterable prints all values, remaining values are filled by the values assigned to fill value.
JSON, which stands for JavaScript Object Notation, is a popular data format for online data exchange. JSON is the best format for organizing data between a client and a server. The programming language JavaScript is comparable to this language’s syntax. JSON’s primary goal is data transmission between the client and the web server. It is the most efficient method of exchanging data and is simple to master. It works with many other programming languages, including Python, Perl, Java, etc.
In JavaScript, JSON primarily supports the following six forms of data:
String
Number
Boolean
Null
Object
Array
Two structures form the foundation of JSON:
Data is kept in name/value pairs. It is handled like a record, object, dictionary, hash table, or keyed list.
An array, vector, list, or sequence is all considered equivalent to the ordered list of values.
The Python dictionary is comparable to the JSON data structure. Here is an illustration of JSON data:
Json is a module that Python offers. Python supports the marshal and pickle modules from the standard library, and JSON API functions similarly to these libraries. Python natively supports JSON characteristics.
The process of serializing JSON data is known as encoding. Data is transformed into a series of bytes and delivered across the network using the serialization technique.
The following techniques will be covered in this section:
load()
loads()
dump()
dumps()
Serializing JSON
The process used to translate Python objects to JSON is known as serialization. When a computer needs to process a lot of data, it is a good idea to store that data in a file. Using the JSON function, we can store JSON data in a file. The dump() and dumps() methods are available in the json module and are used to modify Python objects.
The following JSON items are created from Python objects. Following is a list of each:
Sr.
Python Objects
JSON
1.
Dict
Object
2.
list, tuple
Array
3.
Str
String
4.
int, float
Number
5.
True
true
6.
False
false
7.
None
null
The writing of JSON data into a file function dump
A dump() function is available in Python to communicate (encode) data in JSON format. It takes two positional arguments: the data object that needs to be serialized and the file-like object that needs to receive the bytes.
Let’s look at the straightforward serialization example:
A file called data.json has been opened in writing mode in the program above. We opened this file in write mode so that it would be created if it didn’t already exist. The dictionary is converted into a JSON string using the json.dump() method.
The function dumps ()
The serialized data is kept in the Python file using the dumps() function. It just takes one argument, which is Python data, to be serialized. We don’t write data to disc; hence the file-like parameter is not used. Let’s think about the following illustration:
JSON allows hierarchical lists, tuples, objects, and basic data types like strings and numbers.
import json
#Python list conversion to JSON Array
print(json.dumps(['Welcome', "to", "javaTpoint"]))
#Python tuple conversion to JSON Array
print(json.dumps(("Welcome", "to", "javaTpoint")))
# Python string conversion to JSON String
print(json.dumps("Hello"))
# Python int conversion to JSON Number
print(json.dumps(1234))
# Python float conversion to JSON Number
print(json.dumps(23.572))
# Boolean conversion to their respective values
print(json.dumps(True))
print(json.dumps(False))
# None value to null
print(json.dumps(None))
The process of converting JSON data into Python objects is known as deserialization. The load() and loads() methods of the json module are used to transform JSON data into Python objects. Following is a list of each:
SR.
JSON
Python
1.
Object
dict
2.
Array
list
3.
String
str
4.
number(int)
int
5.
true
True
6.
false
False
7.
null
None
Although technically not a precise conversion of the JSON data, the above table depicts the opposite of the serialized table. This indicates that the object may not be the same if we encode it and then decode it again later.
Let’s use a real-world illustration. If someone translates anything into Chinese and then back into English, the translation may not be correct. Take this straightforward illustration as an illustration.
Using the dump() function, we have encoded a Python object in the file in the program above. Then, we read the JSON file using the load() function and the argument read_file.
The loads() function, another feature of the json module, is used to translate JSON input into Python objects. It resembles the load() function quite a bit. Think about the following instance:
Import json
a = ["Mathew","Peter",(10,32.9,80),{"Name" : "Tokyo"}]
# Python object into JSON
b = json.dumps(a)
# JSON into Python Object
c = json.loads(b)
print(c)
JSON files are loaded using the json.load() function, while strings are loaded using the json.loads() function.
json.dump() vs json.dumps()
When we want to serialize Python objects into JSON files, we use the json.dump() function. We also utilize the json?dumps() function to transform JSON data into a string for processing and printing.
Python Pretty Print JSON
There are instances when a lot of JSON data needs to be analyzed and debugged. It can be done by giving extra arguments to the json. dumps() and json.dump() functions, such as indent and sort_keys.
Note: Both dump() and dumps() functions accept indent and short_keys arguments.
The keys are sorted in ascending order, and the indent argument has been given five spaces in the code above. Sort_key has a default value of False, and indent has a default value of None.
Coding and Decoding
The process of converting text or values into an encrypted form is known as encoding. Only the selected user can use encrypted data after decoding it. Serialization is another name for the encoding, and deserialization is another for decoding. For the JSON(object) format, encoding and decoding are performed. A well-liked module for such tasks is available in Python. The command listed below can be used to install it on Windows:
pip install demjson
Encoding – The encode() function, which is part of the demon package, is used to turn a Python object into a JSON string representation.
What follows is the syntax:
demjson.encode(self,obj,nest_level = 0)
Example:1 – Encoding using demjson package
import demjson
a = [{"Name": 'Peter',"Age":20, "Subject":"Electronics"}]
print(demjson.encode(a))
Web Scraping is a technique to extract a large amount of data from several websites. The term “scraping” refers to obtaining the information from another source (webpages) and saving it into a local file. For example: Suppose you are working on a project called “Phone comparing website,” where you require the price of mobile phones, ratings, and model names to make comparisons between the different mobile phones. If you collect these details by checking various sites, it will take much time. In that case, web scrapping plays an important role where by writing a few lines of code you can get the desired results.
Web Scrapping extracts the data from websites in the unstructured format. It helps to collect these unstructured data and convert it in a structured form.
Startups prefer web scrapping because it is a cheap and effective way to get a large amount of data without any partnership with the data selling company.
Is Web Scrapping legal?
Here the question arises whether the web scrapping is legal or not. The answer is that some sites allow it when used legally. Web scraping is just a tool you can use it in the right way or wrong way.
Web scrapping is illegal if someone tries to scrap the nonpublic data. Nonpublic data is not reachable to everyone; if you try to extract such data then it is a violation of the legal term.
There are several tools available to scrap data from websites, such as:
Scrapping-bot
Scrapper API
Octoparse
Import.io
Webhose.io
Dexi.io
Outwit
Diffbot
Content Grabber
Mozenda
Web Scrapper Chrome Extension
Why Web Scrapping?
As we have discussed above, web scrapping is used to extract the data from websites. But we should know how to use that raw data. That raw data can be used in various fields. Let’s have a look at the usage of web scrapping:
Dynamic Price Monitoring
It is widely used to collect data from several online shopping sites and compare the prices of products and make profitable pricing decisions. Price monitoring using web scrapped data gives the ability to the companies to know the market condition and facilitate dynamic pricing. It ensures the companies they always outrank others.
Market Research
eb Scrapping is perfectly appropriate for market trend analysis. It is gaining insights into a particular market. The large organization requires a great deal of data, and web scrapping provides the data with a guaranteed level of reliability and accuracy.
Email Gathering
Many companies use personals e-mail data for email marketing. They can target the specific audience for their marketing.
News and Content Monitoring
A single news cycle can create an outstanding effect or a genuine threat to your business. If your company depends on the news analysis of an organization, it frequently appears in the news. So web scraping provides the ultimate solution to monitoring and parsing the most critical stories. News articles and social media platform can directly influence the stock market.
Social Media Scrapping
Web Scrapping plays an essential role in extracting data from social media websites such as Twitter, Facebook, and Instagram, to find the trending topics.
Research and Development
The large set of data such as general information, statistics, and temperature is scrapped from websites, which is analyzed and used to carry out surveys or research and development.
Why use Python for Web Scrapping?
There are other popular programming languages, but why we choose the Python over other programming languages for web scraping? Below we are describing a list of Python’s features that make the most useful programming language for web scrapping.
Dynamically Typed
In Python, we don’t need to define data types for variables; we can directly use the variable wherever it requires. It saves time and makes a task faster. Python defines its classes to identify the data type of variable.
Vast collection of libraries
Python comes with an extensive range of libraries such as NumPy, Matplotlib, Pandas, Scipy, etc., that provide flexibility to work with various purposes. It is suited for almost every emerging field and also for web scrapping for extracting data and do manipulation.
Less Code
The purpose of the web scrapping is to save time. But what if you spend more time in writing the code? That’s why we use Python, as it can perform a task in a few lines of code.
Open-Source Community
Python is open-source, which means it is freely available for everyone. It has one of the biggest communities across the world where you can seek help if you get stuck anywhere in Python code.
The basics of web scraping
The web scrapping consists of two parts: a web crawler and a web scraper. In simple words, the web crawler is a horse, and the scrapper is the chariot. The crawler leads the scrapper and extracts the requested data. Let’s understand about these two components of web scrapping:
The crawler
A web crawler is generally called a “spider.” It is an artificial intelligence technology that browses the internet to index and searches for the content by given links. It searches for the relevant information asked by the programmer.The scrapper
A web scraper is a dedicated tool that is designed to extract the data from several websites quickly and effectively. Web scrappers vary widely in design and complexity, depending on the projects.
How does Web Scrapping work?
These are the following steps to perform web scraping. Let’s understand the working of web scraping.
Step -1: Find the URL that you want to scrape
First, you should understand the requirement of data according to your project. A webpage or website contains a large amount of information. That’s why scrap only relevant information. In simple words, the developer should be familiar with the data requirement.
Step – 2: Inspecting the Page
The data is extracted in raw HTML format, which must be carefully parsed and reduce the noise from the raw data. In some cases, data can be simple as name and address or as complex as high dimensional weather and stock market data.
Step – 3: Write the code
Write a code to extract the information, provide relevant information, and run the code.
Step – 4: Store the data in the file
Store that information in required csv, xml, JSON file format.
Getting Started with Web Scrapping
Python has a vast collection of libraries and also provides a very useful library for web scrapping. Let’s understand the required library for Python.
Library used for web scrapping
Selenium- Selenium is an open-source automated testing library. It is used to check browser activities. To install this library, type the following command in your terminal.
pip install selenium
Note – It is good to use the PyCharm IDE.
Pandas
Pandas library is used for data manipulation and analysis. It is used to extract the data and store it in the desired format.
BeautifulSoup
BeautifulSoup is a Python library that is used to pull data of HTML and XML files. It is mainly designed for web scrapping. It works with the parser to provide a natural way of navigating, searching, and modifying the parse tree. The latest version of BeautifulSoup is 4.8.1.
Let’s understand the BeautifulSoup library in detail.
Installation of BeautifulSoup
You can install BeautifulSoup by typing the following command:
pip install bs4
Installing a parser
BeautifulSoup supports HTML parser and several third-party Python parsers. You can install any of them according to your dependency. The list of BeautifulSoup’s parsers is the following:
Parser
Typical usage
Python’s html.parser
BeautifulSoup(markup,”html.parser”)
lxml’s HTML parser
BeautifulSoup(markup,”lxml”)
lxml’s XML parser
BeautifulSoup(markup,”lxml-xml”)
Html5lib
BeautifulSoup(markup,”html5lib”)
We recommend you to install html5lib parser because it is much suitable for the newer version of Python, or you can install lxml parser.
Type the following command in your terminal:
pip install html5lib
BeautifulSoup is used to transform a complex HTML document into a complex tree of Python objects. But there are a few essential types object which are mostly used:
Tag
A Tag object corresponds to an XML or HTML original document.
soup = bs4.BeautifulSoup("<b class = "boldest">Extremely bold</b>)
tag = soup.b
type(tag)
Output:<class “bs4.element.Tag”>
Tag contains lot of attributes and methods, but most important features of a tag are name and attribute.
Name
Every tag has a name, accessible as .name:
tag.name
Attributes
A tag may have any number of attributes. The tag <b id = “boldest”> has an attribute “id” whose value is “boldest”. We can access a tag’s attributes by treating the tag as dictionary.
tag[id]
# add the element
tag['id'] = 'verybold'
tag['another-attribute'] = 1
tag
# delete the tag
del tag['id']
Multi-valued Attributes
In HTML5, there are some attributes that can have multiple values. The class (consists more than one css) is the most common multivalued attributes. Other attributes are rel, rev, accept-charset, headers, and accesskey.
A string is immutable means it can’t be edited. But it can be replaced with another string using replace_with().
tag.string.replace_with("No longer bold")
tag
In some cases, if you want to use a NavigableString outside the BeautifulSoup, the unicode() helps it to turn into normal Python Unicode string.
BeautifulSoup object
The BeautifulSoup object represents the complete parsed document as a whole. In many cases, we can use it as a Tag object. It means it supports most of the methods described in navigating the tree and searching the tree.
doc=BeautifulSoup("<document><content/>INSERT FOOTER HERE</document","xml")
footer=BeautifulSoup("<footer>Here's the footer</footer>","xml")
doc.find(text="INSERT FOOTER HERE").replace_with(footer)
print(doc)
Output:?xml version=”1.0″ encoding=”utf-8″?> # <document><content/><footer>Here’s the footer</footer></document>
Web Scrapping Example:
Let’s take an example to understand the scrapping practically by extracting the data from the webpage and inspecting the whole page.
First, open your favorite page on Wikipedia and inspect the whole page, and before extracting data from the webpage, you should ensure your requirement. Consider the following code:
#importing the BeautifulSoup Library
importbs4
import requests
#Creating the requests
res = requests.get("https://en.wikipedia.org/wiki/Machine_learning")
print("The object type:",type(res))
# Convert the request object to the Beautiful Soup Object
soup = bs4.BeautifulSoup(res.text,'html5lib')
print("The object type:",type(soup)
Output:The object type <class ‘requests.models.Response’> Convert the object into: <class ‘bs4.BeautifulSoup’>
In the following lines of code, we are extracting all headings of a webpage by class name. Here front-end knowledge plays an essential role in inspecting the webpage.
soup.select('.mw-headline')
for i in soup.select('.mw-headline'):
print(i.text,end = ',')
Output:Overview,Machine learning tasks,History and relationships to other fields,Relation to data mining,Relation to optimization,Relation to statistics, Theory,Approaches,Types of learning algorithms,Supervised learning,Unsupervised learning,Reinforcement learning,Self-learning,Feature learning,Sparse dictionary learning,Anomaly detection,Association rules,Models,Artificial neural networks,Decision trees,Support vector machines,Regression analysis,Bayesian networks,Genetic algorithms,Training models,Federated learning,Applications,Limitations,Bias,Model assessments,Ethics,Software,Free and open-source software,Proprietary software with free and open-source editions,Proprietary software,Journals,Conferences,See also,References,Further reading,External links,
In the above code, we imported the bs4 and requested the library. In the third line, we created a res object to send a request to the webpage. As you can observe that we have extracted all heading from the webpage.
Webpage of Wikipedia Learning
Let’s understand another example; we will make a GET request to the URL and create a parse Tree object (soup) with the use of BeautifulSoup and Python built-in “html5lib” parser.
Here we will scrap the webpage of given link (https://www.smartstart.com/). Consider the following code:
following code:
# importing the libraries
from bs4 import BeautifulSoup
import requests
url="https://www.javatpoint.com/"
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
# Parse the html content
soup = BeautifulSoup(html_content, "html5lib")
print(soup.prettify()) # print the parsed data of html
The above code will display the all html code of javatpoint homepage.
Using the BeautifulSoup object, i.e. soup, we can collect the required data table. Let’s print some interesting information using the soup object:
Let’s print the title of the web page.
print(soup.title)
Output: It will give an output as follow:<title>Tutorials List – Javatpoint</title>
In the above output, the HTML tag is included with the title. If you want text without tag, you can use the following code:
print(soup.title.text)
Output: It will give an output as follow:Tutorials List – Javatpoint
We can get the entire link on the page along with its attributes, such as href, title, and its inner Text. Consider the following code:
for link in soup.find_all(“a”):
print(“Inner Text is: {}”.format(link.text))
print(“Title is: {}”.format(link.get(“title”)))
print(“href is: {}”.format(link.get(“href”)))
Output: It will print all links along with its attributes. Here we display a few of them:href is: https://www.facebook.com/javatpoint Inner Text is: The title is: None href is: https://twitter.com/pagejavatpoint Inner Text is: The title is: None href is: https://www.youtube.com/channel/UCUnYvQVCrJoFWZhKK3O2xLg Inner Text is: The title is: None href is: https://javatpoint.blogspot.com Inner Text is: Learn Java Title is: None href is: https://www.javatpoint.com/java-tutorial Inner Text is: Learn Data Structures Title is: None href is: https://www.javatpoint.com/data-structure-tutorial Inner Text is: Learn C Programming Title is: None href is: https://www.javatpoint.com/c-programming-language-tutorial Inner Text is: Learn C++ Tutorial
Demo: Scraping Data from Flipkart Website
In this example, we will scrap the mobile phone prices, ratings, and model name from Flipkart, which is one of the popular e-commerce websites. Following are the prerequisites to accomplish this task:
Prerequisites:
Python 2.x or Python 3.x with Selenium, BeautifulSoup, Pandas libraries installed.
Google – chrome browser
Scrapping Parser such as html.parser, xlml, etc.
Step – 1: Find the desired URL to scrap
The initial step is to find the URL that you want to scrap. Here we are extracting mobile phone details from the flipkart. The URL of this page is https://www.flipkart.com/search?q=iphones&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off.
Step -2: Inspecting the page
It is necessary to inspect the page carefully because the data is usually contained within the tags. So we need to inspect to select the desired tag. To inspect the page, right-click on the element and click “inspect”.
Step – 3: Find the data for extracting
Extract the Price, Name, and Rating, which are contained in the “div” tag, respectively.
Step – 4: Write the Code
from bs4 import BeautifulSoupas soup
from urllib.request import urlopen as uReq
# Request from the webpage
myurl = "https://www.flipkart.com/search?q=iphones&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
uClient = uReq(myurl)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, features="html.parser")
# print(soup.prettify(containers[0]))
# This variable held all html of webpage
containers = page_soup.find_all("div",{"class": "_3O0U0u"})
# container = containers[0]
# # print(soup.prettify(container))
#
# price = container.find_all("div",{"class": "col col-5-12 _2o7WAb"})
# print(price[0].text)
#
# ratings = container.find_all("div",{"class": "niH0FQ"})
# print(ratings[0].text)
#
# #
# # print(len(containers))
# print(container.div.img["alt"])
# Creating CSV File that will store all data
filename = "product1.csv"
f = open(filename,"w")
headers = "Product_Name,Pricing,Ratings\n"
f.write(headers)
for container in containers:
product_name = container.div.img["alt"]
price_container = container.find_all("div", {"class": "col col-5-12 _2o7WAb"})
price = price_container[0].text.strip()
rating_container = container.find_all("div",{"class":"niH0FQ"})
ratings = rating_container[0].text
# print("product_name:"+product_name)
# print("price:"+price)
# print("ratings:"+ str(ratings))
edit_price = ''.join(price.split(','))
sym_rupee = edit_price.split("?")
add_rs_price = "Rs"+sym_rupee[1]
split_price = add_rs_price.split("E")
final_price = split_price[0]
split_rating = str(ratings).split(" ")
final_rating = split_rating[0]
print(product_name.replace(",", "|")+","+final_price+","+final_rating+"\n")
f.write(product_name.replace(",", "|")+","+final_price+","+final_rating+"\n")
f.close()
Output:
We scrapped the details of the iPhone and saved those details in the CSV file as you can see in the output. In the above code, we put a comment on the few lines of code for testing purpose. You can remove those comments and observe the output.
In this tutorial, we have discussed all basic concepts of web scrapping and described the sample scrapping from the leading online ecommerce site flipkart.
Python Generators are the capabilities that return the crossing object and used to make iterators. It simultaneously traverses all of the items. The generator can also be an expression with syntax similar to that of Python’s list comprehension.
There is a lot of complexity in creating iteration in Python; we need to implement __iter__() and __next__() method to keep track of internal states.
It is a lengthy process to create iterators. That’s why the generator plays an essential role in simplifying this process. If there is no value found in iteration, it raises StopIteration exception.
How to Create Generator function in Python?
In Python, creating a generator is not difficult at all. It is like the typical capability characterized by the def catchphrase and utilizations a yield watchword rather than return. Or on the other hand we can say that if the body of any capability contains a yield explanation, it naturally turns into a generator capability. considering about the accompanying model:
def simple():
for i in range(10):
if(i%2==0):
yield i
#Successive Function call using for loop
for i in simple():
print(i)
Output:0 2 4 6 8
yield vs. return
The yield articulation is answerable for controlling the progression of the generator capability. By saving all states and yielding to the caller, it puts an end to the function’s execution. Later it resumes execution when a progressive capability is called. In the generator function, we can make use of the multiple yield statement.
The return explanation returns a worth and ends the entire capability and just a single return proclamation can be utilized in the capability.
Using multiple yield Statement
We can use the multiple yield statement in the generator function. Consider the following example.
Difference between Generator function and Normal function
<
Typical capability contains just a single Lreturn explanation while generator capability can contain at least one yield proclamation.
The normal function is immediately halted and the caller is given control when the generator functions are called.
The states of the local variables are retained between calls.
StopIteration exception is raised automatically when the function terminates.
Generator Expression
We can undoubtedly make a generator articulation without utilizing client characterized capability. It is equivalent to the lambda capability which makes a mysterious capability; An anonymous generator function is created by the generator’s expressions.
The portrayal of generator articulation resembles the Python list perception. The only difference is that round parentheses take the place of square brackets. The generator expression only calculates one item at a time, whereas the list comprehension calculates the entire list.
Consider the following example:
list = [1,2,3,4,5,6,7]
# List Comprehension
z = [x**3 for x in list]
# Generator expression
a = (x**3 for x in list)
print(a)
print(z)
In the above program, list comprehension has returned the list of cube of elements whereas generator expression has returned the reference of calculated value. Instead of applying a for loop, we can also call next() on the generator object. Let’s consider another example:
list = [1,2,3,4,5,6]
z = (x**3 for x in list)
print(next(z))
print(next(z))
print(next(z))
print(next(z))
Output:1 8 27 64
Note:- When we call the next(), Python calls __next__() on the function in which we have passed it as a parameter.
In the above program, we have used the next() function, which returned the next item of the list.
Example: Write a program to print the table of the given number using the generator.
def table(n):
for i in range(1,11):
yield n*i
i = i+1
for i in table(15):
print(i)
Output:15 30 45 60 75 90 105 120 135 150
In the above example, a generator function is iterating using for loop.
Advantages of Generators
There are various advantages of Generators. Few of them are given below:
1. Easy to implement
Generators are easy to implement as compared to the iterator. In iterator, we have to implement __iter__() and __next__() function.
2. Memory efficient
For many sequences, generators utilize memory efficiently. The generator function, on the other hand, calculates the value and suspends their execution, whereas the normal function returns a sequence from the list, which first creates the entire sequence in memory before returning the result. It resumes for progressive call. A limitless succession generator is an extraordinary illustration of memory streamlining. Let’s talk about it using the sys.getsizeof() function in the example below.
import sys
# List comprehension
nums_squared_list = [i * 2 for i in range(1000)]
print(sys.getsizeof("Memory in Bytes:"nums_squared_list))
# Generator Expression
nums_squared_gc = (i ** 2 for i in range(1000))
print(sys.getsizeof("Memory in Bytes:", nums_squared_gc))
Output:Memory in Bytes: 4508 Memory in Bytes: 56
We can observe from the above output that list comprehension is using 4508 bytes of memory, whereas generator expression is using 56 bytes of memory. It means that generator objects are much efficient than the list compression.
3. Pipelining with Generators
Information Pipeline gives the office to handle huge datasets or stream of information without utilizing additional PC memory.
Let’s say we have a famous restaurant’s log file. The log document has a section (fourth segment) that monitors the quantity of burgers sold consistently and we need to total it to find the complete number of burgers sold in 4 years. The generator can create a pipeline using a series of operations in that scenario. The code for it is as follows:
with open('sells.log') as file:
burger_col = (line[3] for line in file) per_hour = (int(x) for x in burger_col if x != 'N/A')
print("Total burgers sold = ",sum(per_hour))
4. Generate Infinite Sequence
The generator can produce infinite items. Infinite sequences cannot be contained within the memory and since generators produce only one item at a time, consider the following example:
def infinite_sequence():
num = 0
while True:
yield num
num += 1
for i in infinite_sequence():
print(i)
Decorators are one of the most helpful and powerful tools of Python. These are used to modify the behavior of the function. Decorators provide the flexibility to wrap another function to expand the working of wrapped function, without permanently modifying it.
In Decorators, functions are passed as an argument into another function and then called inside the wrapper function.
It is also called meta programming where a part of the program attempts to change another part of program at compile time.
Before understanding the Decorator, we need to know some important concepts of Python.
What are the functions in Python?
Python has the most interesting feature that everything is treated as an object even classes or any variable we define in Python is also assumed as an object. Functions are first-class objects in the Python because they can reference to, passed to a variable and returned from other functions as well. The example is given below:
Example:
def func1(msg): # here, we are creating a function and passing the parameter
print(msg)
func1("Hii, welcome to function ") # Here, we are printing the data of function 1
func2 = func1 # Here, we are copying the function 1 data to function 2
func2("Hii, welcome to function ") # Here, we are printing the data of function 2
Output:Hii, welcome to function Hii, welcome to function
In the above program, when we run the code it give the same output for both functions. The func2referred to function func1 and act as function. We need to understand the following concept of the function:
The function can be referenced and passed to a variable and returned from other functions as well.
The functions can be declared inside another function and passed as an argument to another function.
Inner Function
Python provides the facility to define the function inside another function. These types of functions are called inner functions. Consider the following example:
Example:
def func(): # here, we are creating a function and passing the parameter
print("We are in first function") # Here, we are printing the data of function
def func1(): # here, we are creating a function and passing the parameter
print("This is first child function") # Here, we are printing the data of function 1
def func2(): # here, we are creating a function and passing the parameter
print("This is second child function") # Here, we are printing the data of # function 2
func1()
func2()
func()
Output:We are in first function This is first child function This is second child function
In the above program, it doesn’t matter how the child functions are declared. The execution of the child function makes effect on the output. These child functions are locally bounded with the func() so they cannot be called separately.
A function that accepts other function as an argument is also called higher order function. Consider the following example:
Example:
def add(x): # here, we are creating a function add and passing the parameter
return x+1 # here, we are returning the passed value by adding 1
def sub(x): # here, we are creating a function sub and passing the parameter
return x-1 # here, we are returning the passed value by subtracting 1
def operator(func, x): # here, we are creating a function and passing the parameter
temp = func(x)
return temp
print(operator(sub,10)) # here, we are printing the operation subtraction with 10
print(operator(add,20)) # here, we are printing the operation addition with 20
Output:9 21
In the above program, we have passed the sub() function and add() function as argument in operator() function.
A function can return another function. Consider the below example:
Example:
def hello(): # here, we are creating a function named hello
def hi(): # here, we are creating a function named hi
print("Hello") # here, we are printing the output of the function
return hi # here, we are returning the output of the function
new = hello()
new()
Output:Hello
In the above program, the hi() function is nested inside the hello() function. It will return each time we call hi().
Decorating functions with parameters
Let’s have an example to understand the parameterized decorator function:
Example:
def divide(x,y): # here, we are creating a function and passing the parameter
print(x/y) # Here, we are printing the result of the expression
def outer_div(func): # here, we are creating a function and passing the parameter
def inner(x,y): # here, we are creating a function and passing the parameter
if(x<y):
x,y = y,x
return func(x,y)
# here, we are returning a function with some passed parameters
return inner
divide1 = outer_div(divide)
divide1(2,4)
Output:
Syntactic Decorator
In the above program, we have decorated out_div() that is little bit bulky. Instead of using above method, Python allows to use decorator in easy way with @symbol. Sometimes it is called “pie” syntax.
def outer_div(func): # here, we are creating a function and passing the parameter
def inner(x,y): # here, we are creating a function and passing the parameter
if(x<y):
x,y = y,x
return func(x,y) # here, we are returning the function with the parameters
return inner
# Here, the below is the syntax of generator
@outer_div
def divide(x,y): # here, we are creating a function and passing the parameter
print(x/y)
Output:2.0
Reusing Decorator
We can reuse the decorator as well by recalling that decorator function. Let’s make the decorator to its own module that can be used in many other functions. Creating a file called mod_decorator.py with the following code:
def do_twice(func): # here, we are creating a function and passing the parameter
def wrapper_do_twice():
# here, we are creating a function and passing the parameter
func()
func()
return wrapper_do_twice
We can import mod_decorator.py in another file.
from decorator import do_twice
@do_twice
def say_hello():
print("Hello There")
say_hello()
We can import mod_decorator.py in other file.
from decorator import do_twice
@do_twice
def say_hello():
print("Hello There")
say_hello()
Output:Hello There Hello There
Python Decorator with Argument
We want to pass some arguments in function. Let’s do it in following code:
from decorator import do_twice
@do_twice
def display(name):
print(f"Hello {name}")
display()
As we can see that, the function didn’t accept the argument. Running this code raises an error. We can fix this error by using *args and **kwargsin the inner wrapper function. Modifying the decorator.pyas follows:
Now wrapper_function() can accept any number of argument and pass them on the function.
from decorator import do_twice
@do_twice
def display(name):
print(f"Hello {name}")
display("John")
Output:Hello John Hello John
Returning Values from Decorated Functions
We can control the return type of the decorated function. The example is given below:
from decorator import do_twice
@do_twice
def return_greeting(name):
print("We are created greeting")
return f"Hi {name}"
hi_adam = return_greeting("Adam")
Output:We are created greeting We are created greeting
Fancy Decorators
Let’s understand the fancy decorators by the following topic:
Class Decorators
Python provides two ways to decorate a class. Firstly, we can decorate the method inside a class; there are built-in decorators like @classmethod, @staticmethod and @property in Python. The @classmethod and @staticmethod define methods inside class that is not connected to any other instance of a class. The @property is generally used to modify the getters and setters of a class attributes. Let’s understand it by the following example:
Example: 1-
@property decorator – By using it, we can use the class function as an attribute. Consider the following code:
class Student: # here, we are creating a class with the name Student
def __init__(self,name,grade):
self.name = name
self.grade = grade
@property
def display(self):
return self.name + " got grade " + self.grade
stu = Student("John","B")
print("Name of the student: ", stu.name)
print("Grade of the student: ", stu.grade)
print(stu.display)
Output:Name of the student: John Grade of the student: B John got grade B
Example: 2-
@staticmethod decorator– The @staticmethod is used to define a static method in the class. It is called by using the class name as well as instance of the class. Consider the following code:
class Person: # here, we are creating a class with the name Student
@staticmethod
def hello(): # here, we are defining a function hello
print("Hello Peter")
per = Person()
per.hello()
Person.hello()
Output:Hello Peter Hello Peter
Singleton Class
A singleton class only has one instance. There are many singletons in Python including True, None, etc.
Nesting Decorators
We can use multiple decorators by using them on top of each other. Let’s consider the following example:
In the above code, we have used the nested decorator by stacking them onto one another.
Decorator with Arguments
It is always useful to pass arguments in a decorator. The decorator can be executed several times according to the given value of the argument. Let us consider the following example:
Example:
Import functools # here, we are importing the functools into our program
def repeat(num): # here, we are defining a function repeat and passing parameter
# Here, we are creating and returning a wrapper function
def decorator_repeat(func):
@functools.wraps(func)
def wrapper(*args,**kwargs):
for _ in range(num): # here, we are initializing a for loop and iterating till num
value = func(*args,**kwargs)
return value # here, we are returning the value
return wrapper # here, we are returning the wrapper class
return decorator_repeat
#Here we are passing num as an argument which repeats the print function
@repeat(num=5)
def function1(name):
print(f"{name}")
In the above example, @repeatrefers to a function object that can be called in another function. The @repeat(num = 5)will return a function which acts as a decorator.
The above code may look complex but it is the most commonly used decorator pattern where we have used one additional def that handles the arguments to the decorator.
Note: Decorator with argument is not frequently used in programming, but it provides flexibility. We can use it with or without argument.
Stateful Decorators
Stateful decorators are used to keep track of the decorator state. Let us consider the example where we are creating a decorator that counts how many times the function has been called.
Example:
Import functools # here, we are importing the functools into our program
def count_function(func):
# here, we are defining a function and passing the parameter func
@functools.wraps(func)
def wrapper_count_calls(*args, **kwargs):
wrapper_count_calls.num_calls += 1
print(f"Call{wrapper_count_calls.num_calls} of {func.__name__!r}")
return func(*args, **kwargs)
wrapper_count_calls.num_calls = 0
return wrapper_count_calls # here, we are returning the wrapper call counts
@count_function
def say_hello(): # here, we are defining a function and passing the parameter
print("Say Hello")
say_hello()
say_hello()
Output:Call 1 of ‘say_hello’ Say Hello Call 2 of ‘say_hello’ Say Hello
In the above program, the state represented the number of calls of the function stored in .num_callson the wrapper function. When we call say_hello()it will display the number of the call of the function.
Classes as Decorators
The classes are the best way to maintain state. In this section, we will learn how to use a class as a decorator. Here we will create a class that contains __init__() and take func as an argument. The class needs to be callable so that it can stand in for the decorated function.
To making a class callable, we implement the special __call__() method.
Code
import functools # here, we are importing the functools into our program
class Count_Calls: # here, we are creating a class for getting the call count
def __init__(self, func):
functools.update_wrapper(self, func)
self.func = func
self.num_calls = 0
def __call__(self, *args, **kwargs):
self.num_calls += 1
print(f"Call{self.num_calls} of {self.func.__name__!r}")
return self.func(*args, **kwargs)
@Count_Calls
def say_hello(): # here, we are defining a function and passing the parameter
print("Say Hello")
say_hello()
say_hello()
say_hello()
Output:Call 1 of ‘say_hello’ Say Hello Call 2 of ‘say_hello’ Say Hello Call 3 of ‘say_hello’ Say Hello