snape.nj修改,输出正确顺序domain列与url列,改正summary_juin.md并存储为pdf
This commit is contained in:
parent
a0bc1dc810
commit
637bbd3e17
|
@ -69,7 +69,7 @@ def process_page(driver, url, visited_pages, start_domain, data):
|
|||
hyperlink_content_text = hyperlink_content_element.text
|
||||
print(hyperlink_content_text)
|
||||
# Add URL, Domain, and Content of the hyperlink to the data list
|
||||
data.append([href, start_domain, hyperlink_content_text])
|
||||
data.append([start_domain, href, hyperlink_content_text])
|
||||
# Recursively process the page and follow hyperlinks
|
||||
process_page(driver, href, visited_pages, start_domain, data)
|
||||
except Exception as e:
|
||||
|
|
|
@ -10,7 +10,7 @@
|
|||
|
||||
## 分析结果
|
||||
|
||||
根据分析要求进行得到分析结果,具体见结果表
|
||||
通过对爬取结果进行分析并与标准文档比对,分别在27876页网页中发现错误100处,在4153篇公众号中发现错误33处,具体见结果表
|
||||
|
||||
## 存在问题
|
||||
|
||||
|
|
Binary file not shown.
Loading…
Reference in New Issue