for a little programme i want to fetch the data of various plugins of Wordpress: to be concrete it is about 50 plugins
that have each a domain - see below.
the following data are needed: of the "Version", "Acitve installations" and "Tested up to:"
for a list of wordpress-plugins: - approx 50 plugins are of interest!
https://wordpress.org/plugins/participants-database and so on and so forth.
These plugins are listed in my favorites - so if i create a login with BS4 then i can log in and parse all those favorite-pages.
The first approach: Otherwise i can loop through a set of URL to fetch all the necessary pages.
i need the data of the following three lines:
see for example:
Active installations: <strong>100,000+</strong>
Tested up to: <strong>4.9.4</strong>
we can solve this task with other methods than ousing only BeautifulSoup, but we can do it for example with BS + regular expressions
assuming were able to do this with regular expression we need to locate the script tag in the HTML. The idea is to define a regular expression that would be used for both locating the element with BeautifulSoup and extracting the above mentioned text:
from bs4 import BeautifulSoup
data = """
Last updated: <strong><span>6 days</span> ago</strong> </li>
<li>Active installations: <strong>100,000+</strong></li>
Requires WordPress Version:<strong>4.3.1</strong> </li>
<li>Tested up to: <strong>4.9.4</strong></li>
pattern = re.compile(r'\.val\("([regular expression ]+)"\);', re.MULTILINE | re.DOTALL)
soup = BeautifulSoup(data, "html.parser")
script = soup.find("script", text=pattern)
match = pattern.search(script.text)
text = match.group(1)
Well finally - i want to store the text in a database or a calc-sheet - so it would be great if we can get this in a CVS formate or in an array so that can store it in a db.
Here we are using a simple regular expression for the text but we can go further and be more strict about it but I doubt that would be practically necessary for this problem.
so i have to refine this a bit...