香港三國志 · 版規 | 說明 搜尋 會員 聲望 日曆 統計 |
歡迎訪客 ( 登入 | 註冊 ) | 重寄認證電子郵件 |
Pearltea | |
四品官 發表數: 1,289 所屬群組: 太守 註冊日期: 9-22-2003 活躍:5 聲望:614 |
MIT system outperforms human intuition with algorithms (Pearltea譯)
http://www.upi.com/Science_News/2015/10/17.../4211445087936/ CAMBRIDGE, Mass., Oct. 17 (UPI) -- The Massachusetts Institute of Technology is testing a new computer system aimed at finding patterns in data sets faster than human beings. Researchers at MIT designed a Data Science Machine that searches for patterns in data sets, such as a database of promotional sale dates and weekly profits. While computers can do many things faster than humans, human input is still required to choose what to look for in a large data set -- to find meaning in patterns, not just the patterns themselves. MIT researchers hope to automate that, too. (雖然電腦可以比人更高效率地處理數據,但很多時還是靠人的指令才能從數據中判讀- 但不只是認別模式,還包括在模式中找出意義,MIT的新研究的目標是要把它自動化。) In three competitions, the Data Science Machine competed against 906 human teams and outperformed 615. The teams worked on their predictive algorithms for months while the Machine was able to compute its predictions in two to 12 hours. To conduct analyses, the Machine looks at correlations between data tables using numerical identifiers. It then continually updates these identifiers as it continues to import data. As the identifiers add up, the Machine carries out various mathematical operations such as averages and sums and attempts to find trends in the data. Max Kanter, an MIT student whose thesis served as the foundation for the Machine, says the device could be "a natural complement to human intelligence" and expedite the process of analyzing data. Kanter worked with his advisor Kalyan Veeramachaneni to prep the thesis for presentation next week at the IEEE International Conference on Data Science and Advanced Analytics. Veeramachaneni said the machine could be a crucial asset in finding what components of a data set should be analyzed in order to draw conclusions. For example, although MIT records student performance on online courses, it does not record statistics that could predict a student's likelihood to drop out. The Machine could identify variables such as how long it takes a student to get started on an assignment as well as how much time the student is active in the course and thereby infer the likelihood of course dropout. (雖然MIT系統中有紀錄學生網上課程的成績,但它沒有統計資料來預測學生退出課程的可能性。但新系統可以識別其他因素,例如學生在收到作業和開始作業的時間,和在課程進行時的活躍度,來推斷學生退出課程的可能性) Harvard University computer science professor Margo Seltzer said the project is "one of those unbelievable projects" seeking to solve real-world problems through a new approach. She further said the technology will "become the standard quickly -- very quickly." ------------------------------------------------------------------------------------------------------------------- 可能譯得不好。 這應該是data science的進展吧. 雖然我的行業是data science的"粉絲"。但我認為這個新系統還有不少挑戰要面對的。當中最重要的是「垃圾數據」-big data的好處是inclusive所以是不計較幾個少outliers的。但是在很多大公司,面對的不只是這些,還可能是某系統的紀錄方式長期錯誤,或者錯了多年然後因為business requirements(可能是部門高層的決策)而有所改善,但這些qualitative因素是沒有紀錄的 。分析過後看到的趨勢未必真的是一個趨勢,只是紀錄方式有所進步, 其他的還是business as usual。 我期待各位的意見 本篇文章已被 Pearltea 於 Oct 18 2015, 05:51 編輯過 |
XxEDxX | |
三品官 發表數: 1,467 所屬群組: 太守 註冊日期: 8-30-2011 活躍:8 聲望:507 |
Thank you very much for your sharing and nice translation, Pearltea.
An impressive article about artificial intelligence. It reminds me the time of reading papers about automatic processing and machine learning (A tough time though). From my understanding, this technology has been intensively studied these few years and I agree that it will undoubtedly maintain its academic and applied significance for the coming years. Moreover, I think here Turing Test is always a good extension of reading. |
徐元直 | |||
攤抖首領 發表數: 7,912 所屬群組: 君主 註冊日期: 9-18-2003 活躍:60 聲望:4176 |
光看UPI這個報導我有點不明不白,看了MIT News的原文才清楚一些。
我覺得這個例子的表達方式太跳躍,有一定誤導性。它也並不是「不僅找模式,還找意義」(不是你的翻譯問題,是原文問題)。 用直白的方式來比喻,應該是這樣: MIT的在線課程系統記錄了學生參與課程的原始活動記錄,比如說學生登入課程的時長,教授發佈功課的時間點,提交作業的時間點,作業成績,等等。現在學校想利用這些數據來估計學生的dropout概率,也就是說,他們想找出往年數據中哪些部分跟學生dropout的關聯最大,以之判斷今後使用在線課程的學生dropout概率有多大。 然而,這些原始變量本身任何一個,跟學生最後的dropout可能性並沒有顯著關聯,如果只是用統計軟件逐個測試跟dropout的關聯性,那是沒有結果的。 於是學校把這些信息交給統計學家或數據科學家去分析。這些專家想到,也許需要把(完成作業時間點 - 發佈功課時間點)當成一個新變量,也就是「完成作業所需時間」,再試試看跟dropout概率是否有關聯。當然,這只是開始,最終的工作會更複雜,就好像HKSAN計算活躍度一樣,大概要把「完成作業所需時間」「作業成績」「累計登入時間」「課程討論發帖次數」「課程討論發帖字數」等數據分別賦予不同的權重,用各種方式加乘到一起,最後得出一個「綜合表現值」,這個新變量,才會跟dropout可能性有顯著關聯。至於怎麼調教每一項的權重,以及加乘方式,使得最後算出的「綜合表現值」跟dropout概率的關聯最大化(也就是預測能力最強),又需要大量的分析和嘗試,這個過程除了牽涉計算外,也需要這些專家使用他們的直覺。 至於MIT這個智能算法,創新之處就在於它能夠包辦了這些專家的工作,自動去嘗試各種組合、各種權重和加乘方式,自動找出「綜合表現值」應該如何計算。我沒看到原論文,但我估計這個算法應該是數學意義上的智能,而不是知識意義上的智能。換句話說,算法是純粹用數學的方式去篩選出最好的權重加乘方式,而是不像人類專家,在知識上明白「發佈作業」是什么意思,「提交作業」是什么意思,乃至「作業」是什么意思,再根據相關常理判斷「完成作業所需時間」可能是一個重要的變量,進而嘗試測試其關聯性。對算法來說,它也許只「知道」(依靠操作者輸入數據時手動標定)「完成作業時間點」和「發佈作業時間點」兩者都是時間類變量,兩者都跟「作業」這個領域有關,僅此而已,甚至可能連這些都不需要知道,只依靠大量的計算和試錯來找出最有效的組合,畢竟電腦相對人腦最擅長的就是暴力計算。當然,試錯也要有方向,算法也要做很多邏輯判斷,而不是窮盡所有可能的組合,然後挑出關聯度最高那一次嘗試,那樣的計算複雜性太高,算到世界末日也算不完——這就是智能算法的智能之處了。 由此可見,這種算法的「思考」途徑跟人類專家截然不同。算法是純數學導向的,直接用大量數學分析尋找關聯性,人類專家則往往是知識導向的,比如他們可能會先猜測拖延症也許是導致dropout的重要因素,然後再去想怎樣從原始數據中提煉出「拖延症嚴重程度」,再去驗證。算法的優點在於,它能夠發現那些不大符合直覺的關聯,能補充人類的思維盲點,能分擔數據分析的工作量。但這畢竟只是一個Applied AI,是弱人工智能,而不是強人工智能,它根本不懂得在模式中找出意義,也不能像人類專家那樣,通過思考高層次的意義來探索模式,所以其真正定位,就如其作者所說:it is a natural complement to human intelligence。 本篇文章已被 徐元直 於 Oct 18 2015, 19:39 編輯過 -------------------- ......
|
||
Pearltea | |||||
四品官 發表數: 1,289 所屬群組: 太守 註冊日期: 9-22-2003 活躍:5 聲望:614 |
Appreciate your comment as always. I did hesitate and contemplate a bit at the word 'meaning'. I think it's more likely due to my poor translation and it should have been interpreted as and translated to 'implied significance' instead. While the advanced algorithm won't be a substitute to human intelligence, the complement part would sound a like significant improvement to some of the current method of advanced analytics where they only detects pattern as instructed. I don't know if some of those proposed terms in the articles are merely glamorous buzzwords, but in my industry identifying buried patterns probably existed in the very early days. One of the examples was a finding that showed how a common practice of medical first responders actually and unconsciously caused more severe and costly injuries. The team that presented the findings stated that the testing of the correlation between the degree of injuries and a seemingly safe practice would have not been focused (or at least not taken a high level of priority) in the early analysis, but the data did show a positive correlation between these two factors. Although many times correlation does not equal to causation, the new practice did significantly reduce the cost of the medical bills. I don't know whether algorithm was similar to MIT's new machine, but that could be an example of the blind spot of the knowledge-based human intelligence.
Thanks for your response ED. Good read. I know you're interested in meteorology as well. I'll share with you a case where a consulting firm used big data analytics to find a buried pattern (weather!) for their client: the firm advised a bakery that the revenue for sweet pastry/desserts is higher during rainy days while sandwiches sell better during sunny days, so the bakery was able to double the profit by paying closer attention to the weather forecast and adjust the daily bakery goods inventory accordingly. 本篇文章已被 Pearltea 於 Oct 19 2015, 11:41 編輯過 |
||||