snape.nj修改,输出正确顺序domain列与url列,改正summary_juin.md并存储为pdf

This commit is contained in:
xiaobulu27 2023-06-13 17:31:33 +08:00
parent a0bc1dc810
commit 637bbd3e17
3 changed files with 2 additions and 2 deletions

View File

@ -69,7 +69,7 @@ def process_page(driver, url, visited_pages, start_domain, data):
hyperlink_content_text = hyperlink_content_element.text hyperlink_content_text = hyperlink_content_element.text
print(hyperlink_content_text) print(hyperlink_content_text)
# Add URL, Domain, and Content of the hyperlink to the data list # Add URL, Domain, and Content of the hyperlink to the data list
data.append([href, start_domain, hyperlink_content_text]) data.append([start_domain, href, hyperlink_content_text])
# Recursively process the page and follow hyperlinks # Recursively process the page and follow hyperlinks
process_page(driver, href, visited_pages, start_domain, data) process_page(driver, href, visited_pages, start_domain, data)
except Exception as e: except Exception as e:

View File

@ -10,7 +10,7 @@
## 分析结果 ## 分析结果
根据分析要求进行得到分析结果,具体见结果表 通过对爬取结果进行分析并与标准文档比对分别在27876页网页中发现错误100处在4153篇公众号中发现错误33处,具体见结果表
## 存在问题 ## 存在问题

Binary file not shown.