人工智慧與數位人文 莊昊耘 2023.5.11
html
CSS
class
href
<p class="header1" href="www.google.com">我是一個p標籤</p>
dict
example = { "Course": "人工智慧與數位人文", "Weekday": 4, "instructer": "張瑜芸", "TA": ["標云", "莊昊耘"], }
print(example['Course']) print(example['TA'])
寫一個有關你組員的 dictionary 且不得重複 內容必須包含姓名、學號、系級這學期修的通識課、你的其他組員有誰並且命名為 my_data
my_data
爬取 Yahoo!電影網 所有上映中電影的資料
!pip install beautifulsoup4
import requests from bs4 import BeautifulSoup as bs url = 'https://movies.yahoo.com.tw/movie_intheaters.html' response = requests.get(url=url) soup = bs(response.text, 'lxml') print(soup)
有兩種方式可以幫助定位標籤
右鍵 → 檢查 (檢閱元件/inspect) → (找到目標資料) → 右鍵 → 複製selector
soup.select(<你的 CSS path>)
soup.select(".release_list > li:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > a:nth-child(1)")
print()
print(soup.find('div', 'release_info'))
print(soup.find_all('div', 'release_info'))
list
for
透過 for 迴圈來調整你的程式碼
example_list = [] for item in info_items: name = item.find('div', 'release_movie_name') url = item.find('a')['href'] resultDICT = { "Name": name, "url": url } example_list.append(resultDICT) print(resultDICT)
for item in info_items: name = item.find('div', 'release_movie_name').a.text.strip() level = item.find('div', 'leveltext').span.text.strip() url = item.find('a')['href']
你可以用一個 list 把所有的網址包起來
url_list = [] for item in info_items: name = item.find('div', 'release_movie_name').a.text.strip() english_name = item.find('div', 'en').a.text.strip() level = item.find('div', 'leveltext').span.text.strip() url = item.find('a')['href'] url_list.append(url)
index_list = [] BASE_URL = "https://movies.yahoo.com.tw/movie_intheaters.html?page=" for page_num in range(1, 9): page_url = BASE_URL+page_num index_list.append(page_url) index_list
import json with open("result.json", "w") as f: json.dump(<存了很多 dictionary 的 list>, f, ensure_ascii=False, indent=4)
AIR
try
except
a = input('輸入數字:') print(a + 1)
try: # 使用 try,測試內容是否正確 a = input('輸入數字:') print(a + 1) except: # 如果 try 的內容發生錯誤,就執行 except 裡的內容 print('發生錯誤')
function
index_list
try...except
dump function
<>
import requests from bs4 import BeautifulSoup as bs BASE = "https://movies.yahoo.com.tw/movie_intheaters.html?page={}" index_list = [BASE.format(i) for i in range(1, 10)] # 所有清單的連結
url_list = [] for url in index_list: response = requests.get(url=url) soup = bs(response.text, 'lxml') info_items = soup.find_all('div', 'release_info') for item in info_items: name = item.find('div', 'release_movie_name').a.text.strip() url = item.find('a')['href'] url_list.append(url) # 撈取清單上所有電影的連結
resultLIST = [] for url in url_list: response = requests.get(url=url) soup = bs(response.text, 'lxml') try: resultDICT = { "Name": soup.select("#content_l > div:nth-child(1) > div.l_box_inner > div > div > div.movie_intro_info_r > h1")[0].text.strip(), "release_date": soup.select("#content_l > div:nth-child(1) > div.l_box_inner > div > div > div.movie_intro_info_r > span:nth-child(5)")[0].text, "duration": soup.select("#content_l > div:nth-child(1) > div.l_box_inner > div > div > div.movie_intro_info_r > span:nth-child(6)")[0].text.strip(), "release_company": soup.select("#content_l > div:nth-child(1) > div.l_box_inner > div > div > div.movie_intro_info_r > span:nth-child(7)")[0].text.strip(), "imdb": soup.select("#content_l > div:nth-child(1) > div.l_box_inner > div > div > div.movie_intro_info_r > span:nth-child(8)")[0].text.strip() } except: continue resultLIST.append(resultDICT)
import json with open("result.json", "w") as f: json.dump(resultLIST, f, ensure_ascii=False, indent=4)
要先 demo 投影片怎麼操作
小試身手地方, 也許可以不要計分, 但要求他們要做完!!! 脅迫他們說沒做完這題, 後面就會大麋鹿, 請他們盡量發問, 以便於銜接。如果計分的話, 我怕他們會自我放棄 XDDDD 或是你可以改成最後請人舉手分享回答之類的?
要打開在瀏覽器給他們看