snape.nj修改，输出正确顺序domain列与url列，改正summary_juin.md并存储为pdf

2023-06-13 17:31:33 +08:00 · 2023-06-13 17:31:33 +08:00 · 637bbd3e17
parent a0bc1dc810
commit 637bbd3e17
3 changed files with 2 additions and 2 deletions
--- a/scrape_nj.py
+++ b/scrape_nj.py
@ -69,7 +69,7 @@ def process_page(driver, url, visited_pages, start_domain, data):
            hyperlink_content_text = hyperlink_content_element.text
            print(hyperlink_content_text)
            # Add URL, Domain, and Content of the hyperlink to the data list
-            data.append([href, start_domain, hyperlink_content_text])
+            data.append([start_domain, href, hyperlink_content_text])
            # Recursively process the page and follow hyperlinks
            process_page(driver, href, visited_pages, start_domain, data)
        except Exception as e:
--- a/summary/summary
+++ b/summary/summary
@ -10,7 +10,7 @@

 ## 分析结果

-根据分析要求进行得到分析结果，具体见结果表
+通过对爬取结果进行分析并与标准文档比对，分别在27876页网页中发现错误100处，在4153篇公众号中发现错误33处，具体见结果表

 ## 存在问题

--- a/summary/summary_juin.pdf
+++ b/summary/summary_juin.pdf